CUDA-Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

1. Project Overview

CUDA-Agent is the first known RL-trained model to surpass advanced models such as Claude Opus-4.6 and Gemini 3 Pro on high-performance CUDA kernel generation. It achieves state-of-the-art results on KernelBench, consistently outperforming the torch.compile baseline across difficulty levels, with especially strong gains on the hardest cases. To support the LLM-based CUDA generation community, we have released our training data, expert-designed SKILL.md and agent environment.

2. Dataset Release: CUDA-Agent-Ops-6K

We released the training dataset CUDA-Agent-Ops-6K:

Dataset URL: BytedTsinghua-SIA/CUDA-Agent-Ops-6K
Scale: 6,000 training samples
Construction pipeline:
- Collect reference operators from torch and transformers
- Use an LLM to compose multiple operators into fused tasks
- Apply rule-based filtering to keep executable, deterministic, and non-trivial samples
Filtering criteria:
- Must execute correctly in both eager mode and torch.compile
- Remove stochastic operators and degenerate outputs
- Control runtime range and remove samples highly similar to KernelBench tests to reduce contamination risk

3. `agent_workdir` Overview

agent_workdir is a standardized agent workspace example for the full loop: implement CUDA kernels -> compile -> verify correctness -> profile performance -> iterate.

Key files in this directory:

SKILL.md: workflow constraints and optimization rules for agent execution
model.py: original PyTorch baseline model
model_new.py: optimized model using the custom CUDA extension
binding.cpp / binding_registry.h: shared Python binding registration infrastructure
kernels/: custom CUDA/C++ kernels and their bindings
utils/compile.py + utils/compile.sh: extension build scripts
utils/verification.py: correctness validation script
utils/profiling.py: performance comparison against baseline and torch.compile

Common commands (run inside agent_workdir):

bash utils/compile.sh
python3 -m utils.verification
python3 -m utils.profiling

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agent_workdir		agent_workdir
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA-Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

1. Project Overview

2. Dataset Release: CUDA-Agent-Ops-6K

3. `agent_workdir` Overview

About

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA-Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

1. Project Overview

2. Dataset Release: CUDA-Agent-Ops-6K

3. agent_workdir Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages

3. `agent_workdir` Overview