Skip to content

Commit 9416dba

Browse files
authored
Revise PR link and usage for rfork blog (#280)
Signed-off-by: Anqi Shen <[email protected]>
1 parent ad885ad commit 9416dba

File tree

1 file changed

+16
-3
lines changed

1 file changed

+16
-3
lines changed

blog/2025-12-10-rfork.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ While NCCL serves as Tensor R-Fork backend by leveraging GPU-Direct RDMA, it doe
6868

6969
### TransferEngine backend
7070

71-
To achieve non-disturbing weight transfer, we introduce an alternative backend: <a href=https://github.com/sgl-project/sglang/pull/13125>TransferEngine</a>, which leverages GPU-Direct RDMA for efficient data movement[2]. TransferEngine (TE) is a lightweight RDMA-based transfer runtime that runs alongside each TPWorker on the source instance and exposes GPU-resident weight tensors to remote readers without invoking CUDA kernels on the source.
71+
To achieve non-disturbing weight transfer, we introduce an alternative backend: <a href=https://github.com/sgl-project/sglang/pull/14997>TransferEngine</a>, which leverages GPU-Direct RDMA for efficient data movement[2]. TransferEngine (TE) is a lightweight RDMA-based transfer runtime that runs alongside each TPWorker on the source instance and exposes GPU-resident weight tensors to remote readers without invoking CUDA kernels on the source.
7272

7373
During source SGLang instance initialization:
7474
1. Each TPWorker (tensor parallel worker) spawns a TransferEngine instance.
@@ -94,17 +94,30 @@ Detailed usage please refer to <a href=https://github.com/sgl-project/sglang/blo
9494

9595
### Use NCCL as backend
9696

97+
seed instance:
98+
```shell
99+
python -m sglang.launch_server [args]
100+
```
101+
102+
client instance:
97103
```shell
98104
python -m sglang.launch_server [args] \
99105
--load-format remote_instance \
100106
--remote-instance-weight-loader-seed-instance-ip [seed_instance_ip] \
101107
--remote-instance-weight-loader-seed-instance-service-port [seed_instance_service_port] \
102108
--remote-instance-weight-loader-send-weights-group-ports [send_weights_nccl_group_ports_list] \
103-
--remote-instance-weight-loader-backend nccl # optional, default is "nccl"
109+
--remote-instance-weight-loader-backend nccl
104110
```
105111

106112
### Use TransferEngine as backend
107113

114+
seed instance:
115+
```shell
116+
python -m sglang.launch_server [args] \
117+
--remote-instance-weight-loader-start-seed-via-transfer-engine
118+
```
119+
120+
client instance:
108121
```shell
109122
python -m sglang.launch_server [args] \
110123
--load-format remote_instance \
@@ -148,7 +161,7 @@ The practice of R-Fork opens up more imaginative possibilities: the key concept
148161

149162
[0] Tensor R-Fork Documentation: <a href=https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/rfork.md>Documentation</a>
150163
[1] Tensor R-Fork with NCCL backend: <a href=https://github.com/sgl-project/sglang/pull/8215>PR#8215</a>
151-
[2] Tensor R-Fork with TransferEngine backend: <a href=https://github.com/sgl-project/sglang/pull/13125>PR#13125</a>
164+
[2] Tensor R-Fork with TransferEngine backend: <a href=https://github.com/sgl-project/sglang/pull/14997>PR#14997</a>
152165
[3] Concurrent weights loading from disk: <a href=https://github.com/sgl-project/sglang/pull/7943>PR#7943</a>
153166
[4] Tensor R-Fork Planner SGLang RFC: <a href=https://github.com/sgl-project/sglang/issues/12910>Issue#12910</a>
154167
[5] TransferEngine: <a href=https://kvcache-ai.github.io/Mooncake/design/transfer-engine.html>https://kvcache-ai.github.io/Mooncake/design/transfer-engine.html</a>,

0 commit comments

Comments
 (0)