AFD: update DeepSeek support by hsliuustc0106 · Pull Request #2 · Oliver-ss/vllm

hsliuustc0106 · 2025-09-30T04:09:37Z

Purpose

This PR corresponds to the RFC vllm-project#22799 and a follow-up PR of vllm-project#25162.

online serving requests
AFD for the DeepSeek V2 Lite model as well as a p2p connector for A2E/E2A communication.
extend the metadata of afd connector so that AFD can work with different hardware (GPU, NPU and more).

This PR is in collaboration with @chopper0126 @CZRZ

Later, we are going to support the following features:

multi-stage with micro-batch and async sched (compute&comm)
enable graph mode
offline serving request in a batch manner
multi-node support for full deepseek-V3/R1 models on GPU/NPU.

Test Plan

At this stage, we used 4 GTX3090 GPUs to test the feasibility of our implementation. Both attention and FFN sides shard across 2 GPUs.

#attn side

vllm serve <your/model/path>  --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager  --afd-config '{"afd_connector":"p2pconnector", "afd_role": "attention", "num_afd_stages":"1","afd_extra_config":{"afd_size":"2A2F"}}' 

#ffn side

vllm fserver <your/model/path> --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager --afd-config '{"afd_connector":"p2pconnector", "num_afd_stages":"1", "afd_role": "ffn", "afd_extra_config":{"afd_size":"2A2F"}}'

Test Result

curl -v http://0.0.0.0:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d \
'{ "model": "/home/models/DeepSeek-V2-Lite",
"messages": [
          {"role": "user", "content": "1 3 5 7 9 "}
],
"temperature": 0.6,
"repetition_penalty": 1.0,
"top_p": 0.95,
"top_k": 40,
"max_tokens": 20,
"stream": false}'

By sending a request to the model, we got:

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-09-30T04:09:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

hsliuustc0106 · 2025-09-30T04:22:04Z

@Oliver-ss could you spend some time reviewing the code? the follow-up features will be updated with independent PRs

…ect#26445) Signed-off-by: Nick Hill <nhill@redhat.com>

jo-pillar · 2025-10-15T16:41:33Z

There seems to be some inconsistency in understanding. According to the content of the paper, the number of attention instances has no correlation with the TP parallel scale, and your configuration here should be 1A1F.

jo-pillar · 2025-10-15T16:45:25Z

vllm/distributed/afd_transfer/afd_connector/p2p_connector.py

+            group_name="afd",
+            timeout=timedelta(minutes=2),
+        )
+        ffn_ranks = [i for i in range(ffn_size, ffn_size + attn_size)]


there should be range(attn_size, ffn_size + attn_size)

zh-lin1 · 2025-10-17T12:12:51Z

vllm/model_executor/models/deepseek_v2.py

        hidden_states, residual = self.post_attention_layernorm(
            hidden_states, residual)
+        # ---------ascend ffn need data
+        if forward_ctx.moe_comm_method_name is not None:


Hi, where is this moe_comm_method_name field defined for ForwardContext?

AFD: update DeepSeek support

a86f24b

Merge branch 'afd-step3' into afd-dev

331f1e6

Oliver-ss pushed a commit that referenced this pull request Oct 14, 2025

[Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-proj…

bb6d8c2

…ect#26445) Signed-off-by: Nick Hill <nhill@redhat.com>

chopper0126 mentioned this pull request Oct 15, 2025

[AFD]AFD implementation for dsv3 vllm-project/vllm-ascend#3447

Closed

7 tasks

jo-pillar reviewed Oct 15, 2025

View reviewed changes

zh-lin1 reviewed Oct 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AFD: update DeepSeek support#2

AFD: update DeepSeek support#2
hsliuustc0106 wants to merge 2 commits intoOliver-ss:afd-step3from
JiusiServe:afd-dev

hsliuustc0106 commented Sep 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

hsliuustc0106 commented Sep 30, 2025 •

edited

Loading

Uh oh!

jo-pillar commented Oct 15, 2025

Uh oh!

jo-pillar Oct 15, 2025

Uh oh!

zh-lin1 Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hsliuustc0106 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

hsliuustc0106 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jo-pillar commented Oct 15, 2025

Uh oh!

jo-pillar Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

zh-lin1 Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hsliuustc0106 commented Sep 30, 2025 •

edited

Loading

hsliuustc0106 commented Sep 30, 2025 •

edited

Loading