Skip to content

Commit 1ad3aca

Browse files
Updated TRL integration docs (vllm-project#25684)
Signed-off-by: sergiopaniego <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Sergio Paniego Blanco <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
1 parent 8d0afa9 commit 1ad3aca

File tree

2 files changed

+47
-6
lines changed

2 files changed

+47
-6
lines changed

docs/training/trl.md

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,54 @@
11
# Transformers Reinforcement Learning
22

3-
Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
3+
[Transformers Reinforcement Learning](https://huggingface.co/docs/trl) (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
44

55
Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
66

7-
See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
7+
See the [vLLM integration guide](https://huggingface.co/docs/trl/main/en/vllm_integration) in the TRL documentation for more information.
8+
9+
TRL currently supports the following online trainers with vLLM:
10+
11+
- [GRPO](https://huggingface.co/docs/trl/main/en/grpo_trainer)
12+
- [Online DPO](https://huggingface.co/docs/trl/main/en/online_dpo_trainer)
13+
- [RLOO](https://huggingface.co/docs/trl/main/en/rloo_trainer)
14+
- [Nash-MD](https://huggingface.co/docs/trl/main/en/nash_md_trainer)
15+
- [XPO](https://huggingface.co/docs/trl/main/en/xpo_trainer)
16+
17+
To enable vLLM in TRL, set the `use_vllm` flag in the trainer configuration to `True`.
18+
19+
## Modes of Using vLLM During Training
20+
21+
TRL supports **two modes** for integrating vLLM during training: **server mode** and **colocate mode**. You can control how vLLM operates during training with the `vllm_mode` parameter.
22+
23+
### Server mode
24+
25+
In **server mode**, vLLM runs as an independent process on dedicated GPUs and communicates with the trainer through HTTP requests. This configuration is ideal when you have separate GPUs for inference, as it isolates generation workloads from training, ensuring stable performance and easier scaling.
26+
27+
```python
28+
from trl import GRPOConfig
29+
30+
training_args = GRPOConfig(
31+
...,
32+
use_vllm=True,
33+
vllm_mode="server", # default value, can be omitted
34+
)
35+
```
36+
37+
### Colocate mode
38+
39+
In **colocate mode**, vLLM runs inside the trainer process and shares GPU memory with the training model. This avoids launching a separate server and can improve GPU utilization, but may lead to memory contention on the training GPUs.
40+
41+
```python
42+
from trl import GRPOConfig
43+
44+
training_args = GRPOConfig(
45+
...,
46+
use_vllm=True,
47+
vllm_mode="colocate",
48+
)
49+
```
50+
51+
Some trainers also support **vLLM sleep mode**, which offloads parameters and caches to GPU RAM during training, helping reduce memory usage. Learn more in the [memory optimization docs](https://huggingface.co/docs/trl/main/en/reducing_memory_usage#vllm-sleep-mode).
852

953
!!! info
10-
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
11-
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
12-
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
54+
For detailed configuration options and flags, refer to the documentation of the specific trainer you are using.

mkdocs.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,6 @@ plugins:
102102
- https://numpy.org/doc/stable/objects.inv
103103
- https://pytorch.org/docs/stable/objects.inv
104104
- https://psutil.readthedocs.io/en/stable/objects.inv
105-
- https://huggingface.co/docs/transformers/main/en/objects.inv
106105

107106
markdown_extensions:
108107
- attr_list

0 commit comments

Comments
 (0)