File tree Expand file tree Collapse file tree 1 file changed +5
-0
lines changed
Expand file tree Collapse file tree 1 file changed +5
-0
lines changed Original file line number Diff line number Diff line change 88 - [ GRPO Single Node] ( #grpo-single-node )
99 - [ GRPO Multi-node] ( #grpo-multi-node )
1010 - [ GRPO Qwen2.5-32B] ( #grpo-qwen25-32b )
11+ - [ GRPO Multi-Turn/Tool-Use] ( #grpo-multi-turn )
1112 - [ Supervised Fine-Tuning (SFT)] ( #supervised-fine-tuning-sft )
1213 - [ SFT Single Node] ( #sft-single-node )
1314 - [ SFT Multi-node] ( #sft-multi-node )
@@ -133,9 +134,11 @@ sbatch \
133134 --gres=gpu:8 \
134135 ray.sub
135136```
137+ The required ` CONTAINER ` can be built by following the instructions in the [ Docker documentation] ( docs/docker.md ) .
136138
137139#### GRPO Qwen2.5-32B
138140
141+ This section outlines how to run GRPO for Qwen2.5-32B with a 16k sequence length.
139142``` sh
140143# Run from the root of NeMo RL repo
141144NUM_ACTOR_NODES=16
@@ -158,6 +161,8 @@ sbatch \
158161 ray.sub
159162```
160163
164+ #### GRPO Multi-Turn
165+
161166We also support multi-turn generation and training (tool use, games, etc.).
162167Reference example for training to play a Sliding Puzzle Game:
163168``` sh
You can’t perform that action at this time.
0 commit comments