-
Notifications
You must be signed in to change notification settings - Fork 389
add document for deepseek large EP #2339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds documentation for deploying a large Expert Parallelism (EP) model using a distributed DP server. The documentation is comprehensive, but I've found a few issues in the code examples that could cause problems for users.
Specifically, a Python script example has incorrect data types for variables, which will lead to runtime errors. There's also a likely copy-paste error in a shell command example, and another command example is illustrative rather than a concrete, runnable command, which might be confusing. Finally, there's a syntax error in a markdown note block that will likely break rendering.
I've provided suggestions to fix these issues to improve the clarity and correctness of the documentation.
dp_size = "all dp size for decode/prefill" | ||
dp_size_local = "dp size for current node" | ||
dp_rank_start = "dp beginning number for current node" | ||
dp_ip = "master node ip" | ||
dp_port = "port used to communication" | ||
engine_port = "the beginning port for all dp group in current node" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder values for dp_size
, dp_size_local
, dp_rank_start
, dp_port
, and engine_port
are strings, but they are used in arithmetic operations or as port numbers, which requires integers. This will cause a TypeError
when the script is run. The placeholder values should be integers, and it would be helpful to provide example integer values.
dp_size = "all dp size for decode/prefill" | |
dp_size_local = "dp size for current node" | |
dp_rank_start = "dp beginning number for current node" | |
dp_ip = "master node ip" | |
dp_port = "port used to communication" | |
engine_port = "the beginning port for all dp group in current node" | |
dp_size = 8 # Example: total number of DP workers for decode/prefill | |
dp_size_local = 2 # Example: number of DP workers on the current node | |
dp_rank_start = 0 # Example: starting DP rank for the current node | |
dp_ip = "192.168.1.100" # Example: master node IP | |
dp_port = 12345 # Example: port used for communication | |
engine_port = 8000 # Example: the starting port for all DP groups on the current node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
```shell | ||
cd vllm-ascend/examples/disaggregate_prefill_v1/ | ||
bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node1_local_ip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a copy-paste error in the example command for gen_ranktable.sh
. The IP for decoder_node1_local_ip
is repeated, which is likely incorrect and should probably be for two different decoder nodes (e.g., decoder_node1_local_ip
and decoder_node2_local_ip
). This could lead to misconfiguration for users following this guide.
bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node1_local_ip \ | |
bash gen_ranktable.sh --ips prefiller_node1_local_ip prefiller_node2_local_ip decoder_node1_local_ip decoder_node2_local_ip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
python load_balance_proxy_server_example.py \ | ||
--port "proxy port" \ | ||
--host 0.0.0.0 \ | ||
--prefiller-hosts \ | ||
prefiller node1 local ip \ | ||
prefiller node2 local ip \ | ||
--prefiller-ports \ | ||
engine_port engine_port \ | ||
--decoder-hosts \ | ||
decoder node1 local ip \ | ||
decoder node1 local ip \ | ||
decoder node2 local ip \ | ||
decoder node2 local ip \ | ||
--decoder-ports \ | ||
engine_port ... engine_port + dp_size_local \ | ||
engine_port ... engine_port + dp_size_local \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example for running load_balance_proxy_server_example.py
is illustrative and contains placeholders that are not valid command-line arguments (e.g., prefiller node1 local ip
, engine_port ...
). This could be confusing for users. It would be more helpful to provide a complete, concrete example that they can adapt.
python load_balance_proxy_server_example.py \ | |
--port "proxy port" \ | |
--host 0.0.0.0 \ | |
--prefiller-hosts \ | |
prefiller node1 local ip \ | |
prefiller node2 local ip \ | |
--prefiller-ports \ | |
engine_port engine_port \ | |
--decoder-hosts \ | |
decoder node1 local ip \ | |
decoder node1 local ip \ | |
decoder node2 local ip \ | |
decoder node2 local ip \ | |
--decoder-ports \ | |
engine_port ... engine_port + dp_size_local \ | |
engine_port ... engine_port + dp_size_local \ | |
python load_balance_proxy_server_example.py \ | |
--port 8000 \ | |
--host 0.0.0.0 \ | |
--prefiller-hosts \ | |
192.168.1.101 \ | |
192.168.1.102 \ | |
--prefiller-ports \ | |
9000 9001 \ | |
--decoder-hosts \ | |
192.168.1.103 \ | |
192.168.1.103 \ | |
192.168.1.104 \ | |
192.168.1.104 \ | |
--decoder-ports \ | |
9002 9003 9004 9005 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
:::{note} | ||
Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'. | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax for the note block is incorrect. It uses ::::{note}
and ::::
which will likely cause rendering issues. The correct MyST syntax for a note directive is :::{note}
and :::
.
:::{note} | |
Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'. | |
::: | |
:::{note} | |
Each node local ip should repeat the same times as its '**dp_size_local**', at the same time, each node has the same number of ports as '**dp_size_local**', and ther ports increase sequentially starting from '**engine_port**'. | |
::: |
2d23bc7
to
5a7a03c
Compare
Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: hust17yixuan <[email protected]>
5a7a03c
to
d025adf
Compare
### What this PR does / why we need it? This PR presents a large-EP deployment solution based on vllm-ascend, using DeepSeek as an example. It outlines the end-to-end workflow for model deployment and serves as a reference for developers. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Signed-off-by: hust17yixuan <[email protected]> Signed-off-by: xuyexiong <[email protected]>
What this PR does / why we need it?
This PR presents a large-EP deployment solution based on vllm-ascend, using DeepSeek as an example. It outlines the end-to-end workflow for model deployment and serves as a reference for developers.
Does this PR introduce any user-facing change?
No
How was this patch tested?