Skip to content

Commit 330e649

Browse files
authored
chore: multinode dsr1 doc fix (ai-dynamo#1814)
1 parent 427d547 commit 330e649

File tree

1 file changed

+12
-24
lines changed

1 file changed

+12
-24
lines changed

examples/sglang/multinode-examples.md

Lines changed: 12 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,9 @@
44

55
SGLang allows you to deploy multi-node sized models by adding in the `dist-init-addr`, `nnodes`, and `node-rank` arguments. Below we demonstrate and example of deploying DeepSeek R1 for disaggregated serving across 4 nodes. This example requires 4 nodes of 8xH100 GPUs.
66

7-
**Step 1**: Start NATS/ETCD on your head node. Ensure you have the correct firewall rules to allow communication between the nodes as you will need the NATS/ETCD endpoints to be accessible by all other nodes.
7+
**Step 1**: Use the provided helper script to generate commands to start NATS/ETCD on your head prefill node. This script will also give you environment variables to export on each other node. You will need the IP addresses of your head prefill and head decode node to run this script.
88
```bash
9-
# node 1
10-
docker compose -f lib/runtime/docker-compose.yml up -d
9+
./utils/gen_env_vars.sh
1110
```
1211

1312
**Step 2**: Ensure that your configuration file has the required arguments. Here's an example configuration that runs prefill and the model in TP16:
@@ -22,91 +21,80 @@ python3 components/worker.py \
2221
--served-model-name deepseek-ai/DeepSeek-R1 \
2322
--tp 16 \
2423
--dp-size 16 \
25-
--dist-init-addr HEAD_PREFILL_NODE_IP:29500 \
24+
--dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
2625
--nnodes 2 \
2726
--node-rank 0 \
2827
--enable-dp-attention \
2928
--trust-remote-code \
3029
--skip-tokenizer-init \
3130
--disaggregation-mode prefill \
3231
--disaggregation-transfer-backend nixl \
33-
--mem-fraction-static 0.82 \
32+
--disaggregation-bootstrap-port 30001 \
33+
--mem-fraction-static 0.82
3434
```
3535

3636
Node 2: Run the remaining 8 shards of the prefill worker
3737
```bash
38-
# nats and etcd endpoints
39-
export NATS_SERVER="nats://<node-1-ip>"
40-
export ETCD_ENDPOINTS="<node-1-ip>:2379"
41-
42-
# worker
4338
python3 components/worker.py \
4439
--model-path /model/ \
4540
--served-model-name deepseek-ai/DeepSeek-R1 \
4641
--tp 16 \
4742
--dp-size 16 \
48-
--dist-init-addr HEAD_PREFILL_NODE_IP:29500 \
43+
--dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
4944
--nnodes 2 \
5045
--node-rank 1 \
5146
--enable-dp-attention \
5247
--trust-remote-code \
5348
--skip-tokenizer-init \
5449
--disaggregation-mode prefill \
5550
--disaggregation-transfer-backend nixl \
51+
--disaggregation-bootstrap-port 30001 \
5652
--mem-fraction-static 0.82
5753
```
5854

5955
Node 3: Run the first 8 shards of the decode worker
6056
```bash
61-
# nats and etcd endpoints
62-
export NATS_SERVER="nats://<node-1-ip>"
63-
export ETCD_ENDPOINTS="<node-1-ip>:2379"
64-
65-
# worker
6657
python3 components/decode_worker.py \
6758
--model-path /model/ \
6859
--served-model-name deepseek-ai/DeepSeek-R1 \
6960
--tp 16 \
7061
--dp-size 16 \
71-
--dist-init-addr HEAD_DECODE_NODE_IP:29500 \
62+
--dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
7263
--nnodes 2 \
7364
--node-rank 0 \
7465
--enable-dp-attention \
7566
--trust-remote-code \
7667
--skip-tokenizer-init \
7768
--disaggregation-mode decode \
7869
--disaggregation-transfer-backend nixl \
70+
--disaggregation-bootstrap-port 30001 \
7971
--mem-fraction-static 0.82
8072
```
8173

8274
Node 4: Run the remaining 8 shards of the decode worker
8375
```bash
84-
# nats and etcd endpoints
85-
export NATS_SERVER="nats://<node-1-ip>"
86-
export ETCD_ENDPOINTS="<node-1-ip>:2379"
87-
88-
# worker
8976
python3 components/decode_worker.py \
9077
--model-path /model/ \
9178
--served-model-name deepseek-ai/DeepSeek-R1 \
9279
--tp 16 \
9380
--dp-size 16 \
94-
--dist-init-addr HEAD_DECODE_NODE_IP:29500 \
81+
--dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
9582
--nnodes 2 \
9683
--node-rank 1 \
9784
--enable-dp-attention \
9885
--trust-remote-code \
9986
--skip-tokenizer-init \
10087
--disaggregation-mode decode \
10188
--disaggregation-transfer-backend nixl \
89+
--disaggregation-bootstrap-port 30001 \
10290
--mem-fraction-static 0.82
10391
```
10492

10593
**Step 3**: Run inference
10694
SGLang typically requires a warmup period to ensure the DeepGEMM kernels are loaded. We recommend running a few warmup requests and ensuring that the DeepGEMM kernels load in.
10795

10896
```bash
109-
curl <node-1-ip>:8000/v1/chat/completions \
97+
curl ${HEAD_PREFILL_NODE_IP}:8000/v1/chat/completions \
11098
-H "Content-Type: application/json" \
11199
-d '{
112100
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",

0 commit comments

Comments
 (0)