Skip to content

Commit de18808

Browse files
fix: add missing multi-turn, container information in README (#369)
Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>
1 parent edfd362 commit de18808

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
- [GRPO Single Node](#grpo-single-node)
99
- [GRPO Multi-node](#grpo-multi-node)
1010
- [GRPO Qwen2.5-32B](#grpo-qwen25-32b)
11+
- [GRPO Multi-Turn/Tool-Use](#grpo-multi-turn)
1112
- [Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft)
1213
- [SFT Single Node](#sft-single-node)
1314
- [SFT Multi-node](#sft-multi-node)
@@ -133,9 +134,11 @@ sbatch \
133134
--gres=gpu:8 \
134135
ray.sub
135136
```
137+
The required `CONTAINER` can be built by following the instructions in the [Docker documentation](docs/docker.md).
136138

137139
#### GRPO Qwen2.5-32B
138140

141+
This section outlines how to run GRPO for Qwen2.5-32B with a 16k sequence length.
139142
```sh
140143
# Run from the root of NeMo RL repo
141144
NUM_ACTOR_NODES=16
@@ -158,6 +161,8 @@ sbatch \
158161
ray.sub
159162
```
160163

164+
#### GRPO Multi-Turn
165+
161166
We also support multi-turn generation and training (tool use, games, etc.).
162167
Reference example for training to play a Sliding Puzzle Game:
163168
```sh

0 commit comments

Comments
 (0)