[RL] Changes to enable compilation for trainer by Lucaskabela · Pull Request #2568 · pytorch/torchtitan

Lucaskabela · 2026-03-13T19:07:23Z

Summary

In this PR, we enable naive, JIT style torch.compile for the RL policy trainer. This is the first step towards speeding up the trainer model. Changes are:

Wiring through compilation config:

Added TrainerCompileConfig dataclass with enable (bool) and backend (str, default "eager") fields
Added compile field to PolicyTrainer.Config with compile and aot_eager backend
Added apply_compile() method that calls .compile(backend=..., fullgraph=True) on each transformer
layer -> This is crticial, as torch.compile() results in weight name change which breaks the weight transfer

config_registry.py — Enable compile by default in configs

Both rl_grpo_qwen3_0_6b and rl_grpo_qwen3_debug configs now set
compile=TrainerCompileConfig(enable=True, backend="aot_eager")

vllm_compat/models/attention.py — Make attention operation compile-compatible

Moved the FlashAttnWithBackward autograd function out of the forward() method (nested classes
can't be traced by the compiler) into a module-level FlashAttnVarlenFunction
Registered the flash-attention forward as a torch.library.custom_op (rl::flash_attn_varlen_fwd)
with a fake implementation, so AOT Autograd can trace through it with FakeTensors
Simplified the call site in VLLMCompatibleFlashAttention.forward() to use the new function

Test Plan

python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b --hf_assets_path=torchtitan/experiments/rl/example_checkpoint/Qwen3-0.6B

Results in the same losses as on main - the timing is now like:

Main

[actor=<root>] Step  9 | Loss: +0.0031 | Reward: +0.450 | Correct: 29/40 | Avg tokens: 100 | Logprob diff: mean=-1.1882e-01, max=1.0228e+01 | Time: 21.4s
[actor=<root>]   Step Timing | Generator: 2.3s | Trainer: 15.3s | WeightSync: 3.8s
[actor=<root>] Cumulative Timing | Generator: 21.3s | Trainer: 150.2s | WeightSync: 33.1s | Total: 204.6s
[actor=<root>] RL Training complete
[actor=<root>] Evaluating post-training performance...
[actor=<root>] Eval: Accuracy=55% (11/20) Format=95% (19/20)
[actor=<root>] ================================================================================
[actor=<root>] Pre-training:  Accuracy=40% (8/20) Format=50% (10/20)
[actor=<root>] Post-training: Accuracy=55% (11/20) Format=95% (19/20)
[actor=<root>] ================================================================================```

Changes

[actor=<root>] Step  9 | Loss: +0.0031 | Reward: +0.450 | Correct: 29/40 | Avg tokens: 100 | Logprob diff: mean=-1.1882e-01, max=1.0228e+01 | Time: 13.9s
[actor=<root>]   Step Timing | Generator: 2.4s | Trainer: 6.5s | WeightSync: 5.0s
[actor=<root>] Cumulative Timing | Generator: 21.7s | Trainer: 88.5s | WeightSync: 30.6s | Total: 140.9s
[actor=<root>] RL Training complete
[actor=<root>] Evaluating post-training performance...
[actor=<root>] Eval: Accuracy=55% (11/20) Format=95% (19/20)
[actor=<root>] ================================================================================
[actor=<root>] Pre-training:  Accuracy=40% (8/20) Format=50% (10/20)
[actor=<root>] Post-training: Accuracy=55% (11/20) Format=95% (19/20)
[actor=<root>] ================================================================================
(vllm) [lucaskabela@devgpu007.eag6 ~/torchtitan (lucaskabela/enable_trainer_compile_03_10)]$ pytho

So we save ~60s of runtime e2e via compilation in this manner without affecting our logits/accuracy

torchtitan/experiments/rl/unified/actors/trainer.py

tianyu-l · 2026-03-13T20:03:55Z

torchtitan/experiments/rl/models/vllm_compat_attention.py

if we go with pytorch varlen, do we still need to worry about this file? cc @wwwjn

This is post rebase - yes we do need these changes for the following reasons:

Moving torch.autograd.Function out of the inner context - this is so compile can trace it

_flash_attn_varlen_fwd custom op - this is so torch can trace it; without custom op, we don't know how shape will propogate through this

if ur going with pytorch varlen i think u can directly call varlen_attn (which is already a custom op) pytorch's varlen calls the upstream flash attention impl instead of vllm's flash attention

Sorry I misunderstood this question (I thought this was related to the deletion of vllm_attention directory).

We will need to move the autograd.Function out but yes we won't need the custom op in that case

torchtitan/experiments/rl/unified/actors/trainer.py

tianyu-l · 2026-03-20T02:43:26Z

torchtitan/experiments/rl/models/vllm_compat_attention.py

-from vllm.v1.attention.backends.fa_utils import flash_attn_varlen_func
+
+
+@torch.library.custom_op("rl::flash_attn_varlen_fwd", mutates_args=())


let's not merge this, let use pytorch varlen attention in #2364

cc @zhxchen17 to unblock

tianyu-l · 2026-03-20T02:45:06Z

torchtitan/experiments/rl/models/parallelize.py

 logger = logging.getLogger(__name__)


 def parallelize_qwen3(


@wwwjn could you work with @sanketpurandare to land PP + DTensor so that we can remove this ad hoc function?

Sure, let me follow up with @sanketpurandare

tianyu-l · 2026-03-20T02:46:20Z

torchtitan/experiments/rl/models/parallelize.py

    return model


+def apply_compile(model: nn.Module, compile_config: CompileConfig):


we switched to a central impl in #2615, but I understand that this is doing model.compile() not torch.compile(model)

@fegin can we universally switch to model.compile style (and it should still work with DCP)?

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2026

pytorch-bot bot added the ciflow/8gpu label Mar 13, 2026

Lucaskabela marked this pull request as ready for review March 13, 2026 19:19

tianyu-l reviewed Mar 13, 2026

View reviewed changes

Lucaskabela requested a review from wwwjn March 13, 2026 20:39

tianyu-l requested review from liangel-02 and zhxchen17 March 13, 2026 21:47

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch from 8a62589 to be75f04 Compare March 13, 2026 22:14

tianyu-l reviewed Mar 13, 2026

View reviewed changes

torchtitan/experiments/rl/unified/actors/trainer.py Outdated Show resolved Hide resolved

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch from be75f04 to 520d314 Compare March 13, 2026 23:50

Lucaskabela requested a review from tianyu-l March 14, 2026 00:28

Lucaskabela linked an issue Mar 17, 2026 that may be closed by this pull request

[RL][Feature Request] Tun on torch.compile + cudagraphs for trainer definition #2508

Open

Lucaskabela marked this pull request as draft March 17, 2026 17:34

Changes to enable compilation

9143615

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch 4 times, most recently from b0e8401 to d032ea8 Compare March 19, 2026 21:02

Lucaskabela marked this pull request as ready for review March 19, 2026 21:06

Incorporate feedback on reusing config

76c42cc

Lucaskabela force-pushed the lucaskabela/enable_trainer_compile_03_10 branch from d032ea8 to 76c42cc Compare March 19, 2026 21:42

tianyu-l reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL] Changes to enable compilation for trainer#2568

[RL] Changes to enable compilation for trainer#2568
Lucaskabela wants to merge 2 commits intopytorch:mainfrom
Lucaskabela:lucaskabela/enable_trainer_compile_03_10

Lucaskabela commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tianyu-l Mar 13, 2026

Uh oh!

Lucaskabela Mar 19, 2026

Uh oh!

liangel-02 Mar 19, 2026

Uh oh!

Lucaskabela Mar 20, 2026

Uh oh!

Uh oh!

tianyu-l Mar 20, 2026

Uh oh!

tianyu-l Mar 20, 2026

Uh oh!

wwwjn Mar 20, 2026

Uh oh!

tianyu-l Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		from vllm.v1.attention.backends.fa_utils import flash_attn_varlen_func


		@torch.library.custom_op("rl::flash_attn_varlen_fwd", mutates_args=())

		return model


		def apply_compile(model: nn.Module, compile_config: CompileConfig):

Conversation

Lucaskabela commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Main

Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Lucaskabela commented Mar 13, 2026 •

edited

Loading