[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399

1092626063 · 2025-11-24T09:17:45Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-24T09:17:55Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a comprehensive tutorial for deploying the DeepSeek-V3.1 model. While the document covers various deployment scenarios, I've found several critical errors in the provided code snippets and configurations, particularly for multi-node and prefill-decode disaggregation setups. These issues, including Python syntax errors, incorrect data parallel configurations, and inconsistent model naming, would likely prevent users from successfully following the instructions. My review provides specific corrections to address these critical problems and improve the tutorial's accuracy and usability.

gemini-code-assist · 2025-11-24T09:20:17Z

docs/source/tutorials/DeepSeek-V3.1.md

+def run_command(visiable_devices. dp_rank, vllm_engine_port):
+    command = [
+        "bash",
+        "./run_dp_template.sh",
+        visiable_devices,
+        str(vllm_engine_port),
+        str(dp_size),
+        str(dp_rank),
+        dp_address,
+        dp_rpc_port,
+        str(tp_size),
+    ]
+    subprocess.run(command, check=True)
+
+if __name__ == "__main__":
+    template_path = "./run_dp_template.sh"
+    if not os.path.exists(template_path):
+        print(f"Template file {template_path} does not exist.")
+        sys.exit(1)
+
+    processes = []
+    num_cards = dp_size_local * tp_size
+    for i in range(dp_size_local):
+        dp_rank = dp_rank_start + i
+        vllm_engine_port = vllm_start_port + i
+        visiable_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size))
+        process = multiprocessing.Process(target=run_command,
+                                        args=(visiable_devices, dp_rank,
+                                                vllm_engine_port))


This Python script has a syntax error and a recurring typo. The function definition on line 302 uses a period . instead of a comma ,. Additionally, the variable visiable_devices is misspelled throughout the script and should be visible_devices. These errors will prevent the script from running.

def run_command(visible_devices, dp_rank, vllm_engine_port): command = [ "bash", "./run_dp_template.sh", visible_devices, str(vllm_engine_port), str(dp_size), str(dp_rank), dp_address, dp_rpc_port, str(tp_size), ] subprocess.run(command, check=True) if __name__ == "__main__": template_path = "./run_dp_template.sh" if not os.path.exists(template_path): print(f"Template file {template_path} does not exist.") sys.exit(1) processes = [] num_cards = dp_size_local * tp_size for i in range(dp_size_local): dp_rank = dp_rank_start + i vllm_engine_port = vllm_start_port + i visible_devices = ",".join(str(x) for x in range(i * tp_size, (i + 1) * tp_size)) process = multiprocessing.Process(target=run_command, args=(visible_devices, dp_rank, vllm_engine_port))

docs/source/tutorials/DeepSeek-V3.1.md

gemini-code-assist · 2025-11-24T09:20:17Z

docs/source/tutorials/DeepSeek-V3.1.md

+# d0
+python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 0 --dp-address 141.xx.xx.3 --dp-rpc-port 12321 --vllm-start-port 7100
+# d1
+python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 16 --dp-address 141.xx.xx.4 --dp-rpc-port 12321 --vllm-start-port 7100


The --dp-address for the second decode node (d1) is incorrect. In a distributed data-parallel setup, all worker nodes must point to the same master address. Here, it's set to its own IP (141.xx.xx.4), but it should point to the master node's IP, which is 141.xx.xx.3 as configured for d0.

Suggested change

python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 16 --dp-address 141.xx.xx.4 --dp-rpc-port 12321 --vllm-start-port 7100

python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 16 --dp-address 141.xx.xx.3 --dp-rpc-port 12321 --vllm-start-port 7100

gemini-code-assist · 2025-11-24T09:20:17Z

docs/source/tutorials/DeepSeek-V3.1.md

+--tensor-parallel-size 4 \
+--quantization ascend \
+--seed 1024 \
+--served-model-name deepseek_v3 \


The served-model-name for Node 1 (deepseek_v3) is inconsistent with the name used for Node 0 (deepseek_v3.1 on line 159). All nodes in a multi-node deployment must use the exact same served-model-name to function correctly.

Suggested change

--served-model-name deepseek_v3 \

--served-model-name deepseek_v3.1

Signed-off-by: 1092626063 <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

1092626063 force-pushed the DeepSeek3.1 branch 2 times, most recently from c2ec2e3 to e29fb14 Compare November 24, 2025 12:26

deepseekv3.1 tutorial

be588ec

Signed-off-by: 1092626063 <[email protected]>

1092626063 force-pushed the DeepSeek3.1 branch from e29fb14 to be588ec Compare November 24, 2025 13:33

update

74cdeb3

Signed-off-by: 1092626063 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399

[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399

1092626063 commented Nov 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

gemini-code-assist bot Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 16 --dp-address 141.xx.xx.4 --dp-rpc-port 12321 --vllm-start-port 7100
	python launch_dp_program.py --dp-size 32 --tp-size 1 --dp-size-local 16 --dp-rank-start 16 --dp-address 141.xx.xx.3 --dp-rpc-port 12321 --vllm-start-port 7100

	--served-model-name deepseek_v3 \
	--served-model-name deepseek_v3.1

[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399

Are you sure you want to change the base?

[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399

Conversation

1092626063 commented Nov 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1092626063 commented Nov 24, 2025 •

edited by github-actions bot

Loading