DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

#Triton #CUDA #LLM

https://hgpu.org/?p=30706

DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent researches leverage Large Language Models (LLMs) to automatically convert PyTorch refer…

hgpu.org

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

#CUDA #Triton #Package

https://hgpu.org/?p=30703

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agen…

hgpu.org

Clean up on aisle 7! Interesting idea - who will pay the bill?

US based Portal Space Systems and Australian startup Paladin Space are combing forces to create and launch a scalable, commercial space debris clean-up service.

Paladin’s supplies their Triton debris identification and capture system with Portal provides its maneuverable Starburst spacecraft. Target launch = Q2 2027. https://www.inc.com/chloe-aiello/these-two-startups-are-teaming-up-to-prevent-a-pearl-harbor-moment-in-space/91318935

#Portal #Paladin #Triton #Space #SpaceJunk #LEO #Starburst #SpaceCraft #SpaceDebris

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

#Triton #ROCm #DeepLearning #Package

https://hgpu.org/?p=30696

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

Memory access errors remain one of the most pervasive bugs in GPU programming. Existing GPU sanitizers such as compute-sanitizer detect memory access errors by instrumenting every memory instructio…

hgpu.org

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

#CUDA #Triton #Benchmarking #Package

https://hgpu.org/?p=30694

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity t…

hgpu.org
Topicbox

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU

#Triton #NVIDIA #AMD #LLM

https://hgpu.org/?p=30678

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU

Fine-tuning Large Language Models (LLMs) has become essential for domain adaptation, but its memory-intensive property exceeds the capabilities of most GPUs. To address this challenge and democrati…

hgpu.org

Learn the critical failure points when running LLM inference on Kubernetes, including resource constraints, operator compatibility, security, scalability, and monitoring best practices for production workloads.

#Kubernetes #LLM Inference #Dynatrace #GPU Resource Allocation #Service Mesh #Network Policies #KEDA #Triton Inference Server #Redis #Prometheus

https://dasroot.net/posts/2026/02/running-llm-inference-on-kubernetes-what-breaks-first/

Running LLM Inference on Kubernetes: What Breaks First

Learn the critical failure points when running LLM inference on Kubernetes, including resource constraints, operator compatibility, security, scalability, and monitoring best practices for production workloads.

Technical news about AI, coding and all

Github Awesome (@GithubAwesome)

AutoKernel은 GPU 프로파일링과 커널 최적화 작업을 자동화하는 도구로, Andrej Karpathy의 autoresearch에서 영감을 받아 개발된 자율 에이전트를 사용합니다. 사용자가 PyTorch 모델을 지정하면 백그라운드에서 Triton 커널을 자동으로 최적화해 주므로 모델 개발자가 수동으로 프로파일을 관찰·조정하는 시간을 크게 절약할 수 있습니다.

https://x.com/GithubAwesome/status/2031933791342674364

#autokernel #pytorch #triton #gpuoptimization #autoresearch

Github Awesome (@GithubAwesome) on X

Building AI models and tired of staring at GPU profilers? AutoKernel does it for you. Inspired by Karpathy's autoresearch, it brings autonomous AI agents to GPU kernel optimization. Point it at any PyTorch model, go to sleep, and wake up to optimized Triton kernels. It

X (formerly Twitter)