Skip to content

Commit b575542

Browse files
authored
Revise rfork blog images to PNG format (#272)
Signed-off-by: Anqi Shen <[email protected]>
1 parent 4f897fa commit b575542

File tree

7 files changed

+3
-15
lines changed

7 files changed

+3
-15
lines changed

blog/2025-12-10-rfork.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Let Tensors Fly — Accelerating Large Model Weight Loading with R-Fork"
33
author: "Ant Group DeepXPU Team, SGLang Team"
44
date: "December 10, 2025"
5-
previewImg: /images/blog/rfork/preview.svg
5+
previewImg: /images/blog/rfork/preview.png
66
---
77

88
## TL;DR
@@ -45,7 +45,7 @@ From the data flow analysis, we observe that weight tensors are stored on each G
4545
To maximize the utilization of InfiniBand NIC's bandwidth, we design a per GPU-pair data transfer strategy: a local GPU directly transfers data to/from its paired remote GPU. This design effectively bypasses the PCIe bottleneck between GPU and CPU, enabling high-throughput communication without relying on CPU or host memory.
4646
The data flow of loading weights from remote SGLang instance is the following:
4747

48-
<img src="/images/blog/rfork/design.svg" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%; image-orientation: none;"></img>
48+
<img src="/images/blog/rfork/design.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%; image-orientation: none;"></img>
4949

5050

5151
## Implementation
@@ -117,7 +117,7 @@ python -m sglang.launch_server [args] \
117117

118118
We evaluated the performance of launching a new SGLang instance equipped with eight NVIDIA H20 GPUs, while loading the DeepSeek-R1 model from different sources.
119119

120-
<img src="/images/blog/rfork/performance.svg" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%; image-orientation: none;"></img>
120+
<img src="/images/blog/rfork/performance.png" style="display:block; margin-top: auto; margin-left: auto; margin-right: auto; margin-bottom: auto; width: 100%; image-orientation: none;"></img>
121121

122122
Registering the memory region can be overlapped with other initialization phases to further optimize total boot-up time.
123123

203 KB
Loading

public/images/blog/rfork/design.svg

Lines changed: 0 additions & 4 deletions
This file was deleted.
87.4 KB
Loading

public/images/blog/rfork/performance.svg

Lines changed: 0 additions & 4 deletions
This file was deleted.
1.49 MB
Loading

public/images/blog/rfork/preview.svg

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)