Skip to content

Commit 3ad46ff

Browse files
docs: update wideep documentation to precompile DeepGemm kernels beforehand (#4104)
Signed-off-by: Tushar Sharma <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 62e2865 commit 3ad46ff

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

docs/backends/sglang/dsr1-wideep-gb200.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ docker run \
4848
dynamo-wideep-gb200:latest
4949
```
5050

51+
In each container, you should be in the /sgl-workspace/dynamo/examples/backends/sglang directory.
52+
5153
3. Run the ingress and prefill worker
5254

5355
```bash
@@ -104,6 +106,25 @@ python3 -m dynamo.sglang \
104106

105107
On the other prefill nodes (this example has 2 total prefill nodes), run the same command but change `--node-rank` to 1
106108

109+
> [!IMPORTANT]
110+
> If you encounter random CPU recv timeout issues during the warm-up phase in multi-GPU or multi-node setups, they are likely caused by DeepGEMM kernel compilation overhead.
111+
> To avoid these non-deterministic timeouts, it's strongly recommended to precompile the DeepGEMM kernels before launching the SGLang engine. This ensures all kernels are cached and ready, preventing long initialization delays or distributed timeout errors. To precompile and use cached kernels, please execute the following commands:
112+
113+
```bash
114+
# 1. Precompile DeepGEMM kernels
115+
export SGLANG_DG_CACHE_DIR="/configs/dgcache/3p1dcache"
116+
python3 -m sglang.compile_deep_gemm <ServerArgs>
117+
118+
# 2. Launch the engine with the same cache directory
119+
export SGLANG_DG_CACHE_DIR="/configs/dgcache/3p1dcache"
120+
python3 -m dynamo.frontend <ServerArgs>
121+
```
122+
123+
> [!NOTE]
124+
> There's a known issue where the compile request may fail due to missing bootstrap information, but the kernels are still successfully cached.
125+
> Using a gradual warm-up phase and enabling caching for FlashInfer (similar to DeepGEMM) can further improve stability and reduce startup time.
126+
> See https://github.com/sgl-project/sglang/issues/9867#issuecomment-3336551174 for more details.
127+
107128
4. Run the decode worker on the head decode node
108129

109130
```bash

docs/backends/sglang/dsr1-wideep-h100.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,25 @@ python3 -m dynamo.sglang \
8686

8787
On the other prefill node (since this example has 4 total prefill nodes), run the same command but change `--node-rank` to 1,2, and 3
8888

89+
> [!IMPORTANT]
90+
> If you encounter random CPU recv timeout issues during the warm-up phase in multi-GPU or multi-node setups, they are likely caused by DeepGEMM kernel compilation overhead.
91+
> To avoid these non-deterministic timeouts, it's strongly recommended to precompile the DeepGEMM kernels before launching the SGLang engine. This ensures all kernels are cached and ready, preventing long initialization delays or distributed timeout errors. To precompile and use cached kernels, please execute the following commands:
92+
93+
```bash
94+
# 1. Precompile DeepGEMM kernels
95+
export SGLANG_DG_CACHE_DIR="/configs/dgcache/3p1dcache"
96+
python3 -m sglang.compile_deep_gemm <ServerArgs>
97+
98+
# 2. Launch the engine with the same cache directory
99+
export SGLANG_DG_CACHE_DIR="/configs/dgcache/3p1dcache"
100+
python3 -m dynamo.frontend <ServerArgs>
101+
```
102+
103+
> [!NOTE]
104+
> There's a known issue where the compile request may fail due to missing bootstrap information, but the kernels are still successfully cached.
105+
> Using a gradual warm-up phase and enabling caching for FlashInfer (similar to DeepGEMM) can further improve stability and reduce startup time.
106+
> See https://github.com/sgl-project/sglang/issues/9867#issuecomment-3336551174 for more details.
107+
89108
4. Run the decode worker on the head decode node
90109

91110
```bash

0 commit comments

Comments
 (0)