add document for deepseek large EP #2339

hust17yixuan · 2025-08-12T10:47:06Z

What this PR does / why we need it?

This PR presents a large-EP deployment solution based on vllm-ascend, using DeepSeek as an example. It outlines the end-to-end workflow for model deployment and serves as a reference for developers.

Does this PR introduce any user-facing change?

No

How was this patch tested?

gemini-code-assist

Code Review

This pull request adds documentation for deploying a large Expert Parallelism (EP) model using a distributed DP server. The documentation is comprehensive, but I've found a few issues in the code examples that could cause problems for users.
Specifically, a Python script example has incorrect data types for variables, which will lead to runtime errors. There's also a likely copy-paste error in a shell command example, and another command example is illustrative rather than a concrete, runnable command, which might be confusing. Finally, there's a syntax error in a markdown note block that will likely break rendering.
I've provided suggestions to fix these issues to improve the clarity and correctness of the documentation.

gemini-code-assist · 2025-08-12T10:48:30Z

docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md

+dp_size = "all dp size for decode/prefill"
+dp_size_local = "dp size for current node"
+dp_rank_start = "dp beginning number for current node"
+dp_ip = "master node ip"
+dp_port = "port used to communication"
+engine_port = "the beginning port for all dp group in current node"


The placeholder values for dp_size, dp_size_local, dp_rank_start, dp_port, and engine_port are strings, but they are used in arithmetic operations or as port numbers, which requires integers. This will cause a TypeError when the script is run. The placeholder values should be integers, and it would be helpful to provide example integer values.

Suggested change

dp_size = "all dp size for decode/prefill"

dp_size_local = "dp size for current node"

dp_rank_start = "dp beginning number for current node"

dp_ip = "master node ip"

dp_port = "port used to communication"

engine_port = "the beginning port for all dp group in current node"

dp_size = 8 # Example: total number of DP workers for decode/prefill

dp_size_local = 2 # Example: number of DP workers on the current node

dp_rank_start = 0 # Example: starting DP rank for the current node

dp_ip = "192.168.1.100" # Example: master node IP

dp_port = 12345 # Example: port used for communication

engine_port = 8000 # Example: the starting port for all DP groups on the current node

gemini-code-assist · 2025-08-12T10:48:30Z

docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md

+
+```shell
+cd vllm-ascend/examples/disaggregate_prefill_v1/
+bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node1_local_ip \


There appears to be a copy-paste error in the example command for gen_ranktable.sh. The IP for decoder_node1_local_ip is repeated, which is likely incorrect and should probably be for two different decoder nodes (e.g., decoder_node1_local_ip and decoder_node2_local_ip). This could lead to misconfiguration for users following this guide.

Suggested change

bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node1_local_ip \

bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node2_local_ip \

gemini-code-assist · 2025-08-12T10:48:30Z

docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md

+python load_balance_proxy_server_example.py \
+  --port "proxy port" \
+  --host 0.0.0.0 \
+  --prefiller-hosts \
+    prefiller node1 local ip \
+    prefiller node2 local ip \
+  --prefiller-ports  \
+    engine_port engine_port \
+  --decoder-hosts \
+    decoder node1 local ip  \
+    decoder node1 local ip  \
+    decoder node2 local ip  \
+    decoder node2 local ip  \
+  --decoder-ports  \
+    engine_port ... engine_port + dp_size_local \
+    engine_port ... engine_port + dp_size_local \


The example for running load_balance_proxy_server_example.py is illustrative and contains placeholders that are not valid command-line arguments (e.g., prefiller node1 local ip, engine_port ...). This could be confusing for users. It would be more helpful to provide a complete, concrete example that they can adapt.

Suggested change

python load_balance_proxy_server_example.py \

--port "proxy port" \

--host 0.0.0.0 \

--prefiller-hosts \

prefiller node1 local ip \

prefiller node2 local ip \

--prefiller-ports \

engine_port engine_port \

--decoder-hosts \

decoder node1 local ip \

decoder node1 local ip \

decoder node2 local ip \

decoder node2 local ip \

--decoder-ports \

engine_port ... engine_port + dp_size_local \

engine_port ... engine_port + dp_size_local \

python load_balance_proxy_server_example.py \

--port 8000 \

--host 0.0.0.0 \

--prefiller-hosts \

192.168.1.101 \

192.168.1.102 \

--prefiller-ports \

9000 9001 \

--decoder-hosts \

192.168.1.103 \

192.168.1.103 \

192.168.1.104 \

192.168.1.104 \

--decoder-ports \

9002 9003 9004 9005

gemini-code-assist · 2025-08-12T10:48:30Z

docs/source/developer_guide/performance/distributed_dp_server_with_large_ep.md

+:::{note}
+Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'.
+:::


The syntax for the note block is incorrect. It uses ::::{note} and :::: which will likely cause rendering issues. The correct MyST syntax for a note directive is :::{note} and :::.

Suggested change

:::{note}

Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'.

:::

:::{note}

Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'.

:::

Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]>

### What this PR does / why we need it? This PR presents a large-EP deployment solution based on vllm-ascend, using DeepSeek as an example. It outlines the end-to-end workflow for model deployment and serves as a reference for developers. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: xuyexiong <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Aug 12, 2025

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

hust17yixuan force-pushed the v0.9.1-dev branch 7 times, most recently from 2d23bc7 to 5a7a03c Compare August 14, 2025 01:18

hust17yixuan force-pushed the v0.9.1-dev branch from 5a7a03c to d025adf Compare August 14, 2025 06:18

ganyi1996ppo merged commit 17c2884 into vllm-project:v0.9.1-dev Aug 14, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add document for deepseek large EP #2339

add document for deepseek large EP #2339

hust17yixuan commented Aug 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

hust17yixuan Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

hust17yixuan Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

hust17yixuan Aug 14, 2025

Uh oh!

gemini-code-assist bot Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

	bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node1_local_ip \
	bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node2_local_ip \

add document for deepseek large EP #2339

add document for deepseek large EP #2339

Conversation

hust17yixuan commented Aug 12, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

hust17yixuan Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

hust17yixuan Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

hust17yixuan Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!