Commit f65fd72
[ray,rollout,trtllm] feat: Adding tensorrt_llm as new rollout engine (verl-project#4665)
## What does this PR do?
[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) has recently
added [Ray
orchestrator](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/ray_orchestrator)
and essential features required for the RL workflow. This PR introduces
TensorRT-LLM as a new rollout engine for VeRL.
VeRL currently supports several rollout modes:
- **Hybrid engine:** The training and rollout engines share the same
process group. VeRL uses the `WorkerDict` class to manage multiple
workers within a single process group. Communication between training
and rollout workers takes place within the same process, allowing them
to share the Torch GPU memory pool.
- **Colocated:** Different engines use the same set of GPUs but run in
separate process groups. Currently, this mode is used only by the reward
model.
- **Standalone:** Rollout engines use completely independent GPU
resources.
Unlike other rollout engines, TensorRT-LLM primarily targets the
*colocated* mode. However, instead of relying purely on standard
colocated mode, we introduced a mixed design combining aspects of the
hybrid engine and colocated mode. The design goals are:
- Clear resource separation through distinct process groups, offering
maximum flexibility between training and rollout processes.
- Hybrid workers that act as proxies to LLM servers.
- Fully RESTful rollout API support through `TRTLLMHttpServer`.
- A unified framework for both asynchronous and synchronous RL
workflows.
This PR aims to make the integration as minimally intrusive as possible
to VeRL's infrastructure. Currently, it only invokes
`RolloutReplica.init_hybrid_colocated()` when both the hybrid engine is
enabled and the rollout engine is set to TensorRT-LLM.
## High Level Design
Please refer to
[workers/rollout/trtllm_rollout/trtllm_async_rollout.md](https://github.com/davidmlw/verl/pull/43/changes#diff-96bab8796296991333a973a5211166f45993b13d7c533732c83bcf23c5664f39)
for more details.
```mermaid
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'18px', 'edgeLabelBackground':'#eeeeee'}}}%%
flowchart TB
space1[" "]
style space1 fill:none,stroke:none
subgraph VERL["<b>VERL Training Pipeline</b>"]
subgraph Workers["<b>Training Workers</b>"]
Actor["<b>Actor Worker</b>"]
Critic["<b>Critic Worker</b>"]
RefModel["<b>Ref Model Worker</b>"]
end
Actor -->|<b>Weight Updates<br/>IPC</b>| Rollout["<b>TensorRT-LLM Rollout</b>"]
subgraph RayCluster["<b>Rollout Workers<br/>(Ray Cluster)</b>"]
space2[" "]
style space2 fill:none,stroke:none
subgraph AsyncRollout["<b>TRTLLMAsyncRollout<br/>(per DP rank)</b>"]
DPLeader["<b>• DP Leader coordination</b>"]
IPCMgmt["<b>• IPC handle management</b>"]
HTTPAdapter["<b>• HTTP adapter for server communication</b>"]
end
AsyncRollout -->|<b>HTTP/REST API</b>| HTTPServer
subgraph HTTPServer["<b>TRTLLMHttpServer<br/>(Ray Actor per Replica)</b>"]
OpenAI["<b>• OpenAI Server wrapper</b>"]
EngMgmt["<b>• AsyncLLM engine management</b>"]
MemMgmt["<b>• Memory management (resume/release)</b>"]
end
HTTPServer --> AsyncLLM
subgraph AsyncLLM["<b>TensorRT-LLM<br/>AsyncLLM Engine</b>"]
GPUWorkers["<b>• GPU workers (Tensor Parallel)</b>"]
KVCache["<b>• KV Cache management</b>"]
CUDAGraph["<b>• CUDA Graph optimization</b>"]
end
end
end
space1 ~~~ VERL
style VERL fill:#e1f5ff
style RayCluster fill:#fff4e6
style AsyncRollout fill:#f3e5f5
style HTTPServer fill:#e8f5e9
style AsyncLLM fill:#fce4ec
```
## Experiments results:
Setup: single node with H100 * 8/slurm env.
1. FSDP/GRPO: Qwen2-7B (TP1 * 8 on 8 GPUs, launching cmd `bash
examples/grpo_trainer/run_qwen2-7b_math_trtllm.sh 1`)
* Convergence:
<img width="563" height="352" alt="image"
src="https://github.com/user-attachments/assets/5df943a7-e4ce-416f-8601-0655738bb33d"
/>
* Validation:
<img width="1155" height="344" alt="image"
src="https://github.com/user-attachments/assets/a1a203e1-a85e-46c9-a9ea-e9c0f3caf683"
/>
2. FSDP/GRPO: Qwen2-7B (TP4 * 2 on 8 GPUs, launching cmd `bash
examples/grpo_trainer/run_qwen2-7b_math_trtllm.sh 4`)
* Convergence:
<img width="553" height="354" alt="image"
src="https://github.com/user-attachments/assets/dedfe3e2-498e-4d77-80bb-f1cd5d916c21"
/>
* Validation:
<img width="1132" height="353" alt="image"
src="https://github.com/user-attachments/assets/fbf6ae33-3643-466a-94e7-7edd70f53b3c"
/>
3. Megatron/GRPO: Qwen2-7B (TP1 * 8 on 8 GPUs, launching cmd `bash
examples/grpo_trainer/run_qwen2-7b_math_megatron_trtllm.sh 1`)
* Convergence:
<img width="766" height="323" alt="image"
src="https://github.com/user-attachments/assets/6d9bc023-c5e7-466a-bf31-7ef9eda7b06d"
/>
* Validation:
<img width="1546" height="338" alt="image"
src="https://github.com/user-attachments/assets/ee6e263c-7779-4915-93dd-2f414370a9fc"
/>
4. Megatron/GRPO: Qwen2-7B (TP2 * 2 on 8 GPUs, launching cmd `bash
examples/grpo_trainer/run_qwen2-7b_math_megatron_trtllm.sh 4`)
* Convergence:
<img width="746" height="322" alt="image"
src="https://github.com/user-attachments/assets/7a21dc39-0467-4b85-a231-8e5994b76a8a"
/>
* Validation:
<img width="1552" height="334" alt="image"
src="https://github.com/user-attachments/assets/00a3307b-a0d6-4d13-8f30-85f903f0c946"
/>
## Special notes for using VeRL with TensorRT-LLM:
1. All RL required APIs for VeRL were implemented within [TensorRT-LLM
1.2.0rc6](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release?version=1.2.0rc6).
To install VeRL with TensorRT-LLM, please use command `pip install -e
".[trtllm]" --extra-index-url https://pypi.nvidia.com/`.
2. All verification of integration work was primarily done in Slurm
environment.
3. The current design requires `export
RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1` and the following env
settings before launching the Ray cluster. While these have been
included in any example scripts or tests added, we will work toward
removing such dependencies to improve the user experience in the near
future.
```
# Clean all slurm / MPI / PMIx env to avoid pmix mismatch error
for v in $(env | awk -F= '/^(PMI|PMIX|MPI|OMPI|SLURM)_/{print $1}'); do
unset "$v"
done
# Force UCX to use only eth0; otherwise, it will attempt to use all available devices and raise warnings if any issues occur.
export TRTLLM_UCX_INTERFACE=eth0
```
## Outstanding issues for this MR:
1. WIP on passing CI tests
## Upcoming works (in separate MRs)
1. Further performance optimization
3. Multi-node testing and functionality will be delivered in the near
future.
7. The current MR focuses on and was validated wtih Qwen model variants.
We'll work on validations and optimizations for MoE models as the next
step.
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: Jonas Yang <joyang@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>1 parent 3b1c139 commit f65fd72
File tree
23 files changed
+1676
-18
lines changed- .github/workflows
- docker
- docs
- workers
- examples/grpo_trainer
- tests/special_sanity
- verl
- experimental/agent_loop
- single_controller/ray
- trainer
- config
- rollout
- ppo
- workers
- config
- rollout
- trtllm_rollout
23 files changed
+1676
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
0 commit comments