Update docs/guides/rm.md

jveronvialard · terrykong · web-flow · commit 615aa982757d · 2025-07-29T05:54:02.000-07:00
Co-authored-by: Terry Kong &lt;terrycurtiskong@gmail.com&gt;
Signed-off-by: Julien Veron Vialard &lt;50602890+jveronvialard@users.noreply.github.com&gt;
diff --git a/docs/guides/rm.md b/docs/guides/rm.md
@@ -9,13 +9,13 @@ The script, [examples/run_rm.py](../../examples/run_rm.py), is used to train a B
 Be sure to launch the job using `uv`. The command to launch a training job is as follows:
 
 ```bash
-uv run examples/run_rm.py --config examples/configs/rm.yaml
+uv run examples/run_rm.py
 
-# Can also add overrides on CLI, like changing the model
+# Can also add overrides on CLI, like changing the config or changing the model
 uv run examples/run_rm.py --config examples/configs/rm.yaml policy.model_name=Qwen/Qwen2.5-1.5B
 ```
 
-You must specify the YAML config. It shares the same base template as the SFT config but includes a new `reward_model_cfg` section with `enabled: true` to load the model as a Reward Model. You can find an example RM config file at [examples/configs/rm.yaml](../../examples/configs/rm.yaml).
+The default YAML config shares the same base template as the SFT config but includes a new `reward_model_cfg` section with `enabled: true` to load the model as a Reward Model. You can find an example RM config file at [examples/configs/rm.yaml](../../examples/configs/rm.yaml).
 
 **Reminder**: Set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). Make sure to log in using `huggingface-cli` if you're working with Llama models.