You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide covers the breaking changes introduced in TRL v1 and how to update your code. Most structural changes (trainers moved to experimental, removed model classes, etc.) already shipped in v0.29 — if you're already on v0.29, this migration is minimal.
|`GRPOConfig`|`vllm_mode`|`"server"`|`"colocate"`| If you use `use_vllm=True` without specifying `vllm_mode`, vLLM will now run in the same process instead of connecting to a separate server. Set `vllm_mode="server"` explicitly if you rely on server mode. |
10
+
|`RLOOConfig`|`vllm_mode`|`"server"`|`"colocate"`| Same as above. |
11
+
12
+
## Renamed options
13
+
14
+
| Config | Parameter | v0 value | v1 value | Action needed |
15
+
| --- | --- | --- | --- | --- |
16
+
|`SFTConfig`|`packing`|`"bfd-requeue"`|`"bfd_split"`| Replace `packing="bfd-requeue"` with `packing="bfd_split"`. The old value will still be accepted for a few versions but will be removed in a future release. |
17
+
18
+
## Migrating from an earlier version
19
+
20
+
Depending on which version you're migrating from, refer to the [release notes](https://github.com/huggingface/trl/releases) for v0.29 and earlier for version-specific changes.
Copy file name to clipboardExpand all lines: docs/source/grpo_trainer.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -206,7 +206,20 @@ We support two ways of using vLLM during training: **server mode** and **colocat
206
206
> [!TIP]
207
207
> By default, Truncated Importance Sampling is activated for vLLM generation to address the generation-training mismatch that occurs when using different frameworks. This can be turned off by setting `vllm_importance_sampling_correction=False`. For more information, see [Truncated Importance Sampling](paper_index#truncated-importance-sampling)
208
208
209
-
#### 🔌 Option 1: Server mode
209
+
#### Option 1: Colocate mode
210
+
211
+
In this mode, vLLM runs inside the trainer process and shares GPU memory with the training model. This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs. This is the default mode.
212
+
213
+
```python
214
+
from trl import GRPOConfig
215
+
216
+
training_args = GRPOConfig(
217
+
...,
218
+
use_vllm=True, # vllm_mode="colocate" by default
219
+
)
220
+
```
221
+
222
+
#### Option 2: Server mode
210
223
211
224
In this mode, vLLM runs in a separate process (and using separate GPUs) and communicates with the trainer via HTTP. This is ideal if you have dedicated GPUs for inference.
212
225
@@ -224,27 +237,13 @@ In this mode, vLLM runs in a separate process (and using separate GPUs) and comm
224
237
training_args = GRPOConfig(
225
238
...,
226
239
use_vllm=True,
227
-
vllm_mode="server",# default value, can be omitted
240
+
vllm_mode="server",
228
241
)
229
242
```
230
243
231
244
> [!WARNING]
232
245
> Make sure that the server is using different GPUs than the trainer, otherwise you may run into NCCL errors. You can specify the GPUs to use with the `CUDA_VISIBLE_DEVICES` environment variable.
233
246
234
-
#### 🧩 Option 2: Colocate mode
235
-
236
-
In this mode, vLLM runs inside the trainer process and shares GPU memory with the training model. This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs.
237
-
238
-
```python
239
-
from trl import GRPOConfig
240
-
241
-
training_args = GRPOConfig(
242
-
...,
243
-
use_vllm=True,
244
-
vllm_mode="colocate",
245
-
)
246
-
```
247
-
248
247
> [!TIP]
249
248
> Depending on the model size and the overall GPU memory requirements for training, you may need to adjust the `vllm_gpu_memory_utilization` parameter in [`GRPOConfig`] to avoid underutilization or out-of-memory errors.
250
249
>
@@ -349,6 +348,7 @@ def main():
349
348
training_args = GRPOConfig(
350
349
per_device_train_batch_size=4,
351
350
use_vllm=True,
351
+
vllm_mode="server",
352
352
vllm_server_host=args.vllm_server_host.replace("ip-", "").replace("-", "."), # from ip-X-X-X-X to X.X.X.X
Copy file name to clipboardExpand all lines: docs/source/rloo_trainer.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -161,7 +161,20 @@ pip install trl[vllm]
161
161
162
162
We support two ways of using vLLM during training: **server mode** and **colocate mode**.
163
163
164
-
#### 🔌 Option 1: Server mode
164
+
#### Option 1: Colocate mode
165
+
166
+
In this mode, vLLM runs inside the trainer process and shares GPU memory with the training model. This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs. This is the default mode.
167
+
168
+
```python
169
+
from trl import RLOOConfig
170
+
171
+
training_args = RLOOConfig(
172
+
...,
173
+
use_vllm=True, # vllm_mode="colocate" by default
174
+
)
175
+
```
176
+
177
+
#### Option 2: Server mode
165
178
166
179
In this mode, vLLM runs in a separate process (and using separate GPUs) and communicates with the trainer via HTTP. This is ideal if you have dedicated GPUs for inference.
167
180
@@ -179,27 +192,13 @@ In this mode, vLLM runs in a separate process (and using separate GPUs) and comm
179
192
training_args = RLOOConfig(
180
193
...,
181
194
use_vllm=True,
182
-
vllm_mode="server",# default value, can be omitted
195
+
vllm_mode="server",
183
196
)
184
197
```
185
198
186
199
> [!WARNING]
187
200
> Make sure that the server is using different GPUs than the trainer, otherwise you may run into NCCL errors. You can specify the GPUs to use with the `CUDA_VISIBLE_DEVICES` environment variable.
188
201
189
-
#### 🧩 Option 2: Colocate mode
190
-
191
-
In this mode, vLLM runs inside the trainer process and shares GPU memory with the training model. This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs.
192
-
193
-
```python
194
-
from trl import RLOOConfig
195
-
196
-
training_args = RLOOConfig(
197
-
...,
198
-
use_vllm=True,
199
-
vllm_mode="colocate",
200
-
)
201
-
```
202
-
203
202
> [!TIP]
204
203
> Depending on the model size and the overall GPU memory requirements for training, you may need to adjust the `vllm_gpu_memory_utilization` parameter in [`RLOOConfig`] to avoid underutilization or out-of-memory errors.
205
204
>
@@ -278,6 +277,7 @@ def main():
278
277
per_device_train_batch_size=4,
279
278
bf16=True,
280
279
use_vllm=True,
280
+
vllm_mode="server",
281
281
vllm_server_host=args.vllm_server_host.replace("ip-", "").replace("-", "."), # from ip-X-X-X-X to X.X.X.X
TRL supports **two modes** for integrating vLLM during training: **server mode** and **colocate mode**.
279
+
TRL supports **two modes** for integrating vLLM during training: **colocate mode**(default) and **server mode**.
280
280
281
-
#### Server Mode
281
+
#### Colocate Mode
282
282
283
-
In **server mode**, vLLM runs as a separate process on dedicated GPUs and communicates with the trainer via HTTP.
284
-
This setup is ideal if you have GPUs dedicated to inference.
283
+
In **colocate mode**, vLLM runs inside the trainer process and shares GPU memory with the training model.
284
+
This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs. This is the default mode.
285
285
286
286
Example configuration:
287
287
@@ -293,8 +293,7 @@ from trl import GRPOConfig
293
293
294
294
training_args = GRPOConfig(
295
295
...,
296
-
use_vllm=True,
297
-
vllm_mode="server", # default value, can be omitted
296
+
use_vllm=True, # vllm_mode="colocate" by default
298
297
)
299
298
```
300
299
@@ -306,8 +305,7 @@ from trl.experimental.online_dpo import OnlineDPOConfig
306
305
307
306
training_args = OnlineDPOConfig(
308
307
...,
309
-
use_vllm=True,
310
-
vllm_mode="server", # default value, can be omitted
308
+
use_vllm=True, # vllm_mode="colocate" by default
311
309
)
312
310
```
313
311
@@ -319,8 +317,7 @@ from trl.experimental.nash_md import NashMDConfig
319
317
320
318
training_args = NashMDConfig(
321
319
...,
322
-
use_vllm=True,
323
-
vllm_mode="server", # default value, can be omitted
320
+
use_vllm=True, # vllm_mode="colocate" by default
324
321
)
325
322
```
326
323
@@ -332,8 +329,7 @@ from trl.experimental.xpo import XPOConfig
332
329
333
330
training_args = XPOConfig(
334
331
...,
335
-
use_vllm=True,
336
-
vllm_mode="server", # default value, can be omitted
332
+
use_vllm=True, # vllm_mode="colocate" by default
337
333
)
338
334
```
339
335
@@ -345,18 +341,17 @@ from trl import RLOOConfig
345
341
346
342
training_args = RLOOConfig(
347
343
...,
348
-
use_vllm=True,
349
-
vllm_mode="server", # default value, can be omitted
344
+
use_vllm=True, # vllm_mode="colocate" by default
350
345
)
351
346
```
352
347
353
348
</hfoption>
354
349
</hfoptions>
355
350
356
-
#### Colocate Mode
351
+
#### Server Mode
357
352
358
-
In **colocate mode**, vLLM runs inside the trainer process and shares GPU memory with the training model.
359
-
This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs.
353
+
In **server mode**, vLLM runs as a separate process on dedicated GPUs and communicates with the trainer via HTTP.
354
+
This setup is ideal if you have GPUs dedicated to inference.
360
355
361
356
Example configuration:
362
357
@@ -369,7 +364,7 @@ from trl import GRPOConfig
369
364
training_args = GRPOConfig(
370
365
...,
371
366
use_vllm=True,
372
-
vllm_mode="colocate",
367
+
vllm_mode="server",
373
368
)
374
369
```
375
370
@@ -382,7 +377,7 @@ from trl.experimental.online_dpo import OnlineDPOConfig
382
377
training_args = OnlineDPOConfig(
383
378
...,
384
379
use_vllm=True,
385
-
vllm_mode="colocate",
380
+
vllm_mode="server",
386
381
)
387
382
```
388
383
@@ -395,7 +390,7 @@ from trl.experimental.nash_md import NashMDConfig
395
390
training_args = NashMDConfig(
396
391
...,
397
392
use_vllm=True,
398
-
vllm_mode="colocate",
393
+
vllm_mode="server",
399
394
)
400
395
```
401
396
@@ -408,7 +403,7 @@ from trl.experimental.xpo import XPOConfig
0 commit comments