Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions docs/source/user-guide/pd-disaggregation/1p1d.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,12 @@ For illustration purposes, let us take GPU as an example and assume the model us
### Run prefill server
Prefiller Launch Command:
```bash
export PYTHONHASHSEED=123456
export CUDA_VISIBLE_DEVICES=0
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7800 \
--block-size 128 \
Expand All @@ -42,14 +40,12 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
### Run decode server
Decoder Launch Command:
```bash
export PYTHONHASHSEED=123456
export CUDA_VISIBLE_DEVICES=0
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7801 \
--block-size 128 \
Expand Down
4 changes: 0 additions & 4 deletions docs/source/user-guide/pd-disaggregation/npgd.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,12 @@ For illustration purposes, let us assume that the model used is Qwen2.5-7B-Instr
### Run prefill server
Prefiller Launch Command:
```bash
export PYTHONHASHSEED=123456
export ASCEND_RT_VISIBLE_DEVICES=0
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7800 \
--block-size 128 \
Expand All @@ -49,14 +47,12 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \
### Run decode server
Decoder Launch Command:
```bash
export PYTHONHASHSEED=123456
export CUDA_VISIBLE_DEVICES=0
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7801 \
--block-size 128 \
Expand Down
4 changes: 0 additions & 4 deletions docs/source/user-guide/pd-disaggregation/xpyd.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,12 @@ For illustration purposes, let us take GPU as an example and assume the model us
### Run prefill servers
Prefiller1 Launch Command:
```bash
export PYTHONHASHSEED=123456
export CUDA_VISIBLE_DEVICES=0
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7800 \
--block-size 128 \
Expand All @@ -41,14 +39,12 @@ vllm serve /home/models/Qwen2.5-7B-Instruct \

Prefiller2 Launch Command:
```bash
export PYTHONHASHSEED=123456
export CUDA_VISIBLE_DEVICES=1
vllm serve /home/models/Qwen2.5-7B-Instruct \
--max-model-len 20000 \
--tensor-parallel-size 1 \
--gpu_memory_utilization 0.87 \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--port 7801 \
--block-size 128 \
Expand Down