Update on "[rl] Add CI for numerics test against vllm native inference"

wwwjn · wwwjn · commit a82ab33ec240 · 2026-03-23T18:19:57.000-07:00
Test cases:
1. Integration tests: 
   - single GPU, no compile + cudagraph
   - multiple GPU (with TP),  no compile + cudagraph
   - multiple GPU, with compile + cudagraph
   - This test runs on A10G (default CI GPU type)
3. Numerics parity test: vLLM native model vs vLLM + TorchTitan wrapper.
    - test_weights_match:          max_diff &lt;= 1e-5 (exact weight loading)
    - test_attention_module:       atol=1e-5 (TP=1)
    - test_end_to_end_logits:      atol=1e-3 (TP=1)
    - We would need to run numerics test for only TP=1. This is because we are assuming both torchtitan and vllm will make sure their multi-GPU implementation is on par with single GPU. And we can add more numerics test under parallelism if needed. 
    - This test runs on H100, and runs FA3 kernel for attention. 

[ghstack-poisoned]
diff --git a/torchtitan/experiments/rl/tests/integration_tests.py b/torchtitan/experiments/rl/tests/integration_tests.py
@@ -36,7 +36,8 @@ def build_rl_test_list() -> list[OverrideDefinitions]:
                     "--config rl_grpo_qwen3_0_6b",
                     "--trainer.parallelism.tensor_parallel_degree 2",
                     "--generator.parallelism.tensor_parallel_degree 2",
-                    "--generator.max_model_len 2048",
+                    "--generator.num_samples_per_prompt 2",
+                    "--no_batch_invariant_mode",
                     "--generator.compile.backend none",
                     "--generator.compile.cudagraph_mode none",
                 ],
@@ -52,7 +53,8 @@ def build_rl_test_list() -> list[OverrideDefinitions]:
                     "--config rl_grpo_qwen3_0_6b",
                     "--trainer.parallelism.tensor_parallel_degree 2",
                     "--generator.parallelism.tensor_parallel_degree 2",
-                    "--generator.max_model_len 2048",
+                    "--generator.num_samples_per_prompt 2",
+                    "--no_batch_invariant_mode",
                 ],
             ],
             "RL GRPO TP=2 compile",