You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2025-12-10-rfork.md
+16-3Lines changed: 16 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ While NCCL serves as Tensor R-Fork backend by leveraging GPU-Direct RDMA, it doe
68
68
69
69
### TransferEngine backend
70
70
71
-
To achieve non-disturbing weight transfer, we introduce an alternative backend: <a href=https://github.com/sgl-project/sglang/pull/13125>TransferEngine</a>, which leverages GPU-Direct RDMA for efficient data movement[2]. TransferEngine (TE) is a lightweight RDMA-based transfer runtime that runs alongside each TPWorker on the source instance and exposes GPU-resident weight tensors to remote readers without invoking CUDA kernels on the source.
71
+
To achieve non-disturbing weight transfer, we introduce an alternative backend: <a href=https://github.com/sgl-project/sglang/pull/14997>TransferEngine</a>, which leverages GPU-Direct RDMA for efficient data movement[2]. TransferEngine (TE) is a lightweight RDMA-based transfer runtime that runs alongside each TPWorker on the source instance and exposes GPU-resident weight tensors to remote readers without invoking CUDA kernels on the source.
72
72
73
73
During source SGLang instance initialization:
74
74
1. Each TPWorker (tensor parallel worker) spawns a TransferEngine instance.
0 commit comments