Skip to content

Commit 233cc07

Browse files
authored
fix: force use of eager (disabled cuda graphs) due to convergence issues (#857)
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
1 parent 0557402 commit 233cc07

File tree

3 files changed

+9
-1
lines changed

3 files changed

+9
-1
lines changed

docs/model-quirks.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ Whether model level support CP only depends on arguments passed to `torch.nn.fun
3939
- It's a known issue that context parallel can't be used together with sequence parallel.
4040
Refer to [here](https://github.com/NVIDIA-NeMo/RL/issues/659) for more details.
4141

42+
## DeepScaleR Recipe Convergence Issues
43+
44+
The DeepScaleR recipe (e.g., `examples/configs/grpo-deepscaler-1.5b-8K.yaml`) has been found to experience convergence issues when CUDA graphs are enabled in vLLM.
45+
46+
**Special Handling:**
47+
- CUDA graphs must be disabled by setting `enforce_eager: True` in the vLLM configuration (https://github.com/NVIDIA-NeMo/RL/pull/857 forces eager execution by default).
48+
4249
## vLLM Async Rollout Timeout
4350

4451
vLLM async generation has a configurable timeout for waiting for individual sample results. This is particularly important for longer sequences on large models.

examples/configs/grpo-deepscaler-1.5b-24K.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ policy:
4242
tensor_parallel_size: 1
4343
pipeline_parallel_size: 1
4444
gpu_memory_utilization: 0.8
45+
enforce_eager: True
4546
max_model_len: ${policy.max_total_sequence_length}
4647
# For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
4748
# For Gemma models, we need to use "auto" due to a vllm bug

examples/configs/grpo-deepscaler-1.5b-8K.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ policy:
102102
pipeline_parallel_size: 1
103103
gpu_memory_utilization: 0.6
104104
max_model_len: ${policy.max_total_sequence_length}
105-
enforce_eager: False
105+
enforce_eager: True
106106
# For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
107107
# For Gemma models, we need to use "auto" due to a vllm bug
108108
load_format: dummy

0 commit comments

Comments
 (0)