Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
title: Installation
- local: quickstart
title: Quickstart
- local: migration_v0_to_v1
title: Migrating from v0 to v1
title: Getting started
- sections:
- local: dataset_formats
Expand Down
57 changes: 57 additions & 0 deletions docs/source/migration_v0_to_v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Migrating from TRL v0 to v1

This guide covers the breaking changes introduced in TRL v1 and how to update your code. Most structural changes (trainers moved to experimental, removed model classes, etc.) already shipped in v0.29 — if you're already on v0.29, this migration is minimal.

## Changed defaults

| Config | Parameter | v0 default | v1 default | Action needed |
|---|---|---|---|---|
| `GRPOConfig` | `vllm_mode` | `"server"` | `"colocate"` | If you use `use_vllm=True` without specifying `vllm_mode`, vLLM will now run in the same process instead of connecting to a separate server. Set `vllm_mode="server"` explicitly if you rely on server mode. |
| `RLOOConfig` | `vllm_mode` | `"server"` | `"colocate"` | Same as above. |

## Already changed in v0.29

The following changes were introduced in v0.29 and are **not new in v1**. They are listed here for completeness if you are migrating from an earlier version.

<details>
<summary>Trainers moved to experimental</summary>

Several trainers were moved from the stable API to `trl.experimental`. They are no longer importable from `trl` directly (except KTO, which still has a compatibility shim with a deprecation warning).

| Trainer | New import |
|---|---|
| PPO | `from trl.experimental.ppo import PPOTrainer, PPOConfig` |
| CPO | `from trl.experimental.cpo import CPOTrainer, CPOConfig` |
| BCO | `from trl.experimental.bco import BCOTrainer, BCOConfig` |
| ORPO | `from trl.experimental.orpo import ORPOTrainer, ORPOConfig` |
| XPO | `from trl.experimental.xpo import XPOTrainer, XPOConfig` |
| Online DPO | `from trl.experimental.online_dpo import OnlineDPOTrainer, OnlineDPOConfig` |
| GKD | `from trl.experimental.gkd import GKDTrainer, GKDConfig` |
| Nash-MD | `from trl.experimental.nash_md import NashMDTrainer, NashMDConfig` |
| PRM | `from trl.experimental.prm import PRMTrainer, PRMConfig` |
| KTO | `from trl.experimental.kto import KTOTrainer, KTOConfig` |

</details>

<details>
<summary>Removed model classes</summary>

| Class | New location |
|---|---|
| `AutoModelForCausalLMWithValueHead` | `trl.experimental.ppo` |
| `AutoModelForSeq2SeqLMWithValueHead` | `trl.experimental.ppo` |
| `PreTrainedModelWrapper` | `trl.experimental.ppo` |

</details>

<details>
<summary>Removed callbacks and utilities</summary>

| What | New location |
|---|---|
| `WinRateCallback` | `trl.experimental.winrate_callback` |
| Judges | `trl.experimental.judges` |
| `peft_module_casting_to_bf16` | `trl.experimental.utils` |
| `FDivergenceType` enum | Removed. Use string values (`"reverse_kl"`, `"js_divergence"`, `"alpha_divergence"`) directly. |

</details>
2 changes: 1 addition & 1 deletion tests/experimental/test_online_dpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ def test_vllm_config_validation(self):

# Test default values
config = OnlineDPOConfig()
assert config.vllm_mode == "server"
assert config.vllm_mode == "colocate"
assert config.vllm_server_base_url is None
assert config.vllm_server_host == "0.0.0.0"
assert config.vllm_server_port == 8000
Expand Down
4 changes: 2 additions & 2 deletions trl/experimental/gold/gold_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ class GOLDConfig(SFTConfig):
Whether to skip EOS token for teacher in ULD loss computation.
use_vllm (`bool`, *optional*, defaults to `False`):
Whether to use vLLM for generating completions from the student model. Requires `vllm` to be installed.
vllm_mode (`str`, *optional*, defaults to `"server"`):
vllm_mode (`str`, *optional*, defaults to `"colocate"`):
Mode for student vLLM integration. Either `"server"` (connect to a running TRL vLLM server) or `"colocate"`
(run vLLM in the same process).
vllm_server_host (`str`, *optional*, defaults to `"0.0.0.0"`):
Expand Down Expand Up @@ -276,7 +276,7 @@ class GOLDConfig(SFTConfig):
metadata={"help": "Whether to use vLLM for generating completions. Requires `vllm` to be installed."},
)
vllm_mode: str = field(
default="server",
default="colocate",
metadata={
"help": 'Mode for vLLM integration. Either "server" (connect to a running TRL vLLM server) or "colocate" (run vLLM in the same process).'
},
Expand Down
4 changes: 2 additions & 2 deletions trl/experimental/online_dpo/online_dpo_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ class may differ from those in [`~transformers.TrainingArguments`].
Model implementation to use for vLLM. Must be one of `"transformers"` or `"vllm"`. `"transformers"`: Use
the `transformers` backend for model implementation. `"vllm"`: Use the `vllm` library for model
implementation.
vllm_mode (`str`, *optional*, defaults to `"server"`):
vllm_mode (`str`, *optional*, defaults to `"colocate"`):
Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `"server"` or
`"colocate"`.

Expand Down Expand Up @@ -303,7 +303,7 @@ class may differ from those in [`~transformers.TrainingArguments`].
},
)
vllm_mode: str = field(
default="server",
default="colocate",
metadata={
"help": "Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `'server'` or "
"`'colocate'`. `'server'`: The trainer will send generation requests to a separate vLLM server. Make sure "
Expand Down
4 changes: 2 additions & 2 deletions trl/trainer/grpo_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ class GRPOConfig(_BaseConfig):
use_vllm (`bool`, *optional*, defaults to `False`):
Whether to use vLLM for generating completions. If set to `True`, the trainer will use vLLM for generation
instead of the default model.generate(). Requires `vllm` to be installed.
vllm_mode (`str`, *optional*, defaults to `"server"`):
vllm_mode (`str`, *optional*, defaults to `"colocate"`):
Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `"server"` or
`"colocate"`.

Expand Down Expand Up @@ -486,7 +486,7 @@ class GRPOConfig(_BaseConfig):
},
)
vllm_mode: str = field(
default="server",
default="colocate",
metadata={
"help": "Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `'server'` or "
"`'colocate'`. `'server'`: The trainer will send generation requests to a separate vLLM server. Make sure "
Expand Down
4 changes: 2 additions & 2 deletions trl/trainer/rloo_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ class RLOOConfig(_BaseConfig):
use_vllm (`bool`, *optional*, defaults to `False`):
Whether to use vLLM for generating completions. If set to `True`, the trainer will use vLLM for generation
instead of the default model.generate(). Requires `vllm` to be installed.
vllm_mode (`str`, *optional*, defaults to `"server"`):
vllm_mode (`str`, *optional*, defaults to `"colocate"`):
Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `"server"` or
`"colocate"`.

Expand Down Expand Up @@ -369,7 +369,7 @@ class RLOOConfig(_BaseConfig):
},
)
vllm_mode: str = field(
default="server",
default="colocate",
metadata={
"help": "Mode to use for vLLM integration when `use_vllm` is set to `True`. Must be one of `'server'` or "
"`'colocate'`. `'server'`: The trainer will send generation requests to a separate vLLM server. Make sure "
Expand Down
Loading