fix: force use of eager (disabled cuda graphs) due to convergence issues (#857)

parthchadha · web-flow · commit 233cc07801fc · 2025-08-06T22:14:22.000Z
Signed-off-by: Parth Chadha &lt;pchadha@nvidia.com&gt;
diff --git a/docs/model-quirks.md b/docs/model-quirks.md
@@ -39,6 +39,13 @@ Whether model level support CP only depends on arguments passed to `torch.nn.fun
 - It's a known issue that context parallel can't be used together with sequence parallel.
 Refer to [here](https://github.com/NVIDIA-NeMo/RL/issues/659) for more details.
 
+## DeepScaleR Recipe Convergence Issues
+
+The DeepScaleR recipe (e.g., `examples/configs/grpo-deepscaler-1.5b-8K.yaml`) has been found to experience convergence issues when CUDA graphs are enabled in vLLM.
+
+**Special Handling:**
+- CUDA graphs must be disabled by setting `enforce_eager: True` in the vLLM configuration (https://github.com/NVIDIA-NeMo/RL/pull/857 forces eager execution by default).
+
 ## vLLM Async Rollout Timeout
 
 vLLM async generation has a configurable timeout for waiting for individual sample results. This is particularly important for longer sequences on large models.
diff --git a/examples/configs/grpo-deepscaler-1.5b-24K.yaml b/examples/configs/grpo-deepscaler-1.5b-24K.yaml
@@ -42,6 +42,7 @@ policy:
       tensor_parallel_size: 1
       pipeline_parallel_size: 1
       gpu_memory_utilization: 0.8
+      enforce_eager: True
       max_model_len: ${policy.max_total_sequence_length}
       # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
       # For Gemma models, we need to use "auto" due to a vllm bug
diff --git a/examples/configs/grpo-deepscaler-1.5b-8K.yaml b/examples/configs/grpo-deepscaler-1.5b-8K.yaml
@@ -102,7 +102,7 @@ policy:
       pipeline_parallel_size: 1
       gpu_memory_utilization: 0.6
       max_model_len: ${policy.max_total_sequence_length}
-      enforce_eager: False
+      enforce_eager: True
       # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
       # For Gemma models, we need to use "auto" due to a vllm bug
       load_format: dummy