[Feature] adapt step3 model with AFD by InhabitancyCocoon · Pull Request #18 · jiangkuaixue123/vllm

InhabitancyCocoon · 2025-12-18T08:37:52Z

Purpose

adapt afd feature to step3 model.

Warning

This PR changes the parameters' order of compute_ffn_output method, which may be a breaking change.

Test Plan

Requires a mini step3 which fits in single H800 / H100. --load_format dummy can be helpful.
Make sure your CUDA_VISIBLE_DEVICES is set properly.
Commands are as follow. Notice the afd_size param.

attn dp 2, dbo enabled

vllm serve /path/to/your/step3 --dtype bfloat16 --data_parallel_size=2 --enable_expert_parallel --enforce_eager --enable-dbo --dbo-prefill-token-threshold 12 --dbo-decode-token-threshold 2 --afd-config '{"afd_connector":"p2pconnector", "afd_role": "attention", "afd_host":"127.0.0.1", "afd_port":"29500","num_afd_stages":"2","afd_extra_config":{"afd_size":"2A2F"}}'

attn tp / ep 2, dbo enabled

vllm serve /path/to/your/step3 --dtype bfloat16 --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager --enable-dbo --dbo-prefill-token-threshold 12 --dbo-decode-token-threshold 2 --afd-config '{"afd_connector":"p2pconnector", "afd_role": "attention", "afd_host":"127.0.0.1", "afd_port":"29500","num_afd_stages":"2","afd_extra_config":{"afd_size":"2A2F"}}'

ffn dp 2

vllm serve /path/to/your/step3 --dtype bfloat16  --data_parallel_size=2 --enable_expert_parallel --enforce_eager --afd-config '{"afd_connector":"p2pconnector", "num_afd_stages":"2", "afd_role": "ffn", "afd_host":"127.0.0.1", "afd_port":"29500", "afd_extra_config":{"afd_size":"2A2F"}}'

ffn tp / ep 2

vllm serve /path/to/your/step3  --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager --afd-config '{"afd_connector":"p2pconnector", "num_afd_stages":"2", "afd_role": "ffn", "afd_host":"127.0.0.1", "afd_port":"29500", "afd_extra_config":{"afd_size":"2A2F"}}'

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-12-18T08:38:01Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

vllm/model_executor/models/deepseek_v2.py

vllm/distributed/afd_transfer/afd_connector/p2p_connector.py

InhabitancyCocoon · 2025-12-18T08:46:51Z

vllm/v1/worker/gpu_model_runner.py

        ):
-            logger.info(f"input_ids: {input_ids.shape}")
-            if inputs_embeds:
+            if input_ids is not None:


这里有bug，inputs_embeds是张量，不能直接用做if判断，顺便加了inputs_ids的判断
@jiangkuaixue123

嗯这个本来就是冗余代码～

InhabitancyCocoon · 2025-12-18T08:47:39Z

vllm/v1/worker/gpu_ffn_model_runner.py

@@ -229,7 +231,7 @@ def _execute_eager_mode(
        else:
            # Single TP case
            rank_ffn_output = self.model.compute_ffn_output(


同修改参数顺序，统一是hidden_states在前，layer_idx在后，之前有的顺序是反着的。
@jiangkuaixue123

InhabitancyCocoon · 2025-12-18T09:23:57Z

vllm/model_executor/models/step3_text.py


        return hidden_states, residual

+    def compute_attn_output(


@jiangkuaixue123

dsv2里也有compute_attn_output，不过这个method看起来根本没用到，我们要删了吗？

没用的话就删了吧

jiangkuaixue123 · 2025-12-18T11:03:55Z

vllm/distributed/afd_transfer/afd_connector/p2p_connector.py

-                    torch.tensor([num_tokens_per_ubatch] * self.config.parallel_config.data_parallel_size,
-                                device="cpu", dtype=torch.int32),
-                )
-            logger.info("jcz recv_metadata self.dp_metadata_list:{}".format(self.dp_metadata_list))


这个为啥不需要了

好像是误删了，跑起来的时候似乎没影响，我改回来。再验证一下。

jiangkuaixue123 · 2025-12-18T11:04:12Z

vllm/distributed/afd_transfer/afd_connector/p2p_connector.py

        )
        self._current_afd_connector_metadata.recv_handle_list = work_list
        self._current_afd_connector_metadata.layer_idx = layer_idx
-        self._current_afd_connector_metadata.stage_idx = stage_idx


还有这个

同误删。

jiangkuaixue123 · 2025-12-18T11:04:43Z

vllm/model_executor/models/step3_text.py


        return hidden_states, residual

+    def compute_attn_output(


没用的话就删了吧

jiangkuaixue123 · 2025-12-18T11:05:58Z

vllm/model_executor/models/step3_text.py

+        positions: torch.Tensor,
+        afd_metadata: AFDMetadata,
+    ) -> tuple[torch.Tensor, torch.Tensor]:
+        recv_handle = None


上次那个改动合入了这个forward可能要改成上次视频通话的那种形式

i-yuanyukun added 3 commits December 18, 2025 14:30

[Feat] adapt step3 text model

d306d01

[Chore] resolve some bugs due to merge

cd16bcf

[Chore] code lint

f74bb82

InhabitancyCocoon assigned InhabitancyCocoon and jiangkuaixue123 Dec 18, 2025

InhabitancyCocoon commented Dec 18, 2025

View reviewed changes

vllm/model_executor/models/deepseek_v2.py Show resolved Hide resolved

InhabitancyCocoon commented Dec 18, 2025

View reviewed changes

vllm/distributed/afd_transfer/afd_connector/p2p_connector.py Show resolved Hide resolved

InhabitancyCocoon commented Dec 18, 2025

View reviewed changes

i-yuanyukun added 2 commits December 18, 2025 17:02

[Chore] remove duplicate code

26ddfa2

[Bugfix] compute ffn output param order

8276320

InhabitancyCocoon marked this pull request as draft December 18, 2025 09:10

InhabitancyCocoon changed the title ~~Afd step3 merge~~ [Feature] adapt step3 model with AFD Dec 18, 2025

InhabitancyCocoon commented Dec 18, 2025

View reviewed changes

[Chore] remove p2p connector duplicate code

6a8d35a

jiangkuaixue123 reviewed Dec 18, 2025

View reviewed changes

i-yuanyukun added 7 commits December 19, 2025 16:02

[Chore] some log info

11d7d5b

[Chore] bring back deleted code

65ea10c

[Chore] adjust log info

bde3601

[Chore] add p2p connector debug log info

6d305dd

[Chore]: step3 forward_with_afd

2a98ab3

[Chore] clean up debug info

27ae2e7

[Chore] remove unused method

60d65cd

InhabitancyCocoon marked this pull request as ready for review December 22, 2025 07:59

InhabitancyCocoon marked this pull request as draft December 22, 2025 08:44

InhabitancyCocoon marked this pull request as ready for review December 22, 2025 09:50

jiangkuaixue123 merged commit 93c656e into afd-p2p-dbo-rebase2 Dec 22, 2025
1 of 2 checks passed

Conversation

InhabitancyCocoon commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Warning

Test Plan

Test Result

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

InhabitancyCocoon commented Dec 18, 2025 •

edited by github-actions bot

Loading