Skip to content

Commit 615aa98

Browse files
Update docs/guides/rm.md
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Julien Veron Vialard <50602890+jveronvialard@users.noreply.github.com>
1 parent a86b6c7 commit 615aa98

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/guides/rm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ The script, [examples/run_rm.py](../../examples/run_rm.py), is used to train a B
99
Be sure to launch the job using `uv`. The command to launch a training job is as follows:
1010

1111
```bash
12-
uv run examples/run_rm.py --config examples/configs/rm.yaml
12+
uv run examples/run_rm.py
1313

14-
# Can also add overrides on CLI, like changing the model
14+
# Can also add overrides on CLI, like changing the config or changing the model
1515
uv run examples/run_rm.py --config examples/configs/rm.yaml policy.model_name=Qwen/Qwen2.5-1.5B
1616
```
1717

18-
You must specify the YAML config. It shares the same base template as the SFT config but includes a new `reward_model_cfg` section with `enabled: true` to load the model as a Reward Model. You can find an example RM config file at [examples/configs/rm.yaml](../../examples/configs/rm.yaml).
18+
The default YAML config shares the same base template as the SFT config but includes a new `reward_model_cfg` section with `enabled: true` to load the model as a Reward Model. You can find an example RM config file at [examples/configs/rm.yaml](../../examples/configs/rm.yaml).
1919

2020
**Reminder**: Set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). Make sure to log in using `huggingface-cli` if you're working with Llama models.
2121

0 commit comments

Comments
 (0)