fix: add missing multi-turn, container information in README (#369)

terrykong · parthchadha · web-flow · commit de18808b4791 · 2025-05-13T18:05:04.000Z
Signed-off-by: Parth Chadha &lt;pchadha@nvidia.com&gt;
Signed-off-by: Terry Kong &lt;terryk@nvidia.com&gt;
Co-authored-by: Parth Chadha &lt;pchadha@nvidia.com&gt;
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@
     - [GRPO Single Node](#grpo-single-node)
     - [GRPO Multi-node](#grpo-multi-node)
       - [GRPO Qwen2.5-32B](#grpo-qwen25-32b)
+    - [GRPO Multi-Turn/Tool-Use](#grpo-multi-turn)
   - [Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft)
     - [SFT Single Node](#sft-single-node)
     - [SFT Multi-node](#sft-multi-node)
@@ -133,9 +134,11 @@ sbatch \
     --gres=gpu:8 \
     ray.sub
 ```
+The required `CONTAINER` can be built by following the instructions in the [Docker documentation](docs/docker.md).
 
 #### GRPO Qwen2.5-32B
 
+This section outlines how to run GRPO for Qwen2.5-32B with a 16k sequence length.
 ```sh
 # Run from the root of NeMo RL repo
 NUM_ACTOR_NODES=16
@@ -158,6 +161,8 @@ sbatch \
     ray.sub
 ```
 
+#### GRPO Multi-Turn
+
 We also support multi-turn generation and training (tool use, games, etc.).
 Reference example for training to play a Sliding Puzzle Game:
 ```sh