[Feature] adapt step3 model with AFD#18
[Feature] adapt step3 model with AFD#18jiangkuaixue123 merged 13 commits intoafd-p2p-dbo-rebase2from
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
| ): | ||
| logger.info(f"input_ids: {input_ids.shape}") | ||
| if inputs_embeds: | ||
| if input_ids is not None: |
There was a problem hiding this comment.
这里有bug,inputs_embeds是张量,不能直接用做if判断,顺便加了inputs_ids的判断
@jiangkuaixue123
| @@ -229,7 +231,7 @@ def _execute_eager_mode( | |||
| else: | |||
| # Single TP case | |||
| rank_ffn_output = self.model.compute_ffn_output( | |||
There was a problem hiding this comment.
同修改参数顺序,统一是hidden_states在前,layer_idx在后,之前有的顺序是反着的。
@jiangkuaixue123
|
|
||
| return hidden_states, residual | ||
|
|
||
| def compute_attn_output( |
There was a problem hiding this comment.
dsv2里也有compute_attn_output,不过这个method看起来根本没用到,我们要删了吗?
| torch.tensor([num_tokens_per_ubatch] * self.config.parallel_config.data_parallel_size, | ||
| device="cpu", dtype=torch.int32), | ||
| ) | ||
| logger.info("jcz recv_metadata self.dp_metadata_list:{}".format(self.dp_metadata_list)) |
There was a problem hiding this comment.
好像是误删了,跑起来的时候似乎没影响,我改回来。再验证一下。
| ) | ||
| self._current_afd_connector_metadata.recv_handle_list = work_list | ||
| self._current_afd_connector_metadata.layer_idx = layer_idx | ||
| self._current_afd_connector_metadata.stage_idx = stage_idx |
|
|
||
| return hidden_states, residual | ||
|
|
||
| def compute_attn_output( |
| positions: torch.Tensor, | ||
| afd_metadata: AFDMetadata, | ||
| ) -> tuple[torch.Tensor, torch.Tensor]: | ||
| recv_handle = None |
There was a problem hiding this comment.
上次那个改动合入了 这个forward可能要改成上次视频通话的那种形式
Purpose
adapt afd feature to step3 model.
Warning
This PR changes the parameters' order of
compute_ffn_outputmethod, which may be a breaking change.Test Plan
Requires a mini step3 which fits in single H800 / H100.
--load_format dummycan be helpful.Make sure your
CUDA_VISIBLE_DEVICESis set properly.Commands are as follow. Notice the
afd_sizeparam.attn dp 2, dbo enabled
vllm serve /path/to/your/step3 --dtype bfloat16 --data_parallel_size=2 --enable_expert_parallel --enforce_eager --enable-dbo --dbo-prefill-token-threshold 12 --dbo-decode-token-threshold 2 --afd-config '{"afd_connector":"p2pconnector", "afd_role": "attention", "afd_host":"127.0.0.1", "afd_port":"29500","num_afd_stages":"2","afd_extra_config":{"afd_size":"2A2F"}}'attn tp / ep 2, dbo enabled
vllm serve /path/to/your/step3 --dtype bfloat16 --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager --enable-dbo --dbo-prefill-token-threshold 12 --dbo-decode-token-threshold 2 --afd-config '{"afd_connector":"p2pconnector", "afd_role": "attention", "afd_host":"127.0.0.1", "afd_port":"29500","num_afd_stages":"2","afd_extra_config":{"afd_size":"2A2F"}}'ffn dp 2
vllm serve /path/to/your/step3 --dtype bfloat16 --data_parallel_size=2 --enable_expert_parallel --enforce_eager --afd-config '{"afd_connector":"p2pconnector", "num_afd_stages":"2", "afd_role": "ffn", "afd_host":"127.0.0.1", "afd_port":"29500", "afd_extra_config":{"afd_size":"2A2F"}}'ffn tp / ep 2
vllm serve /path/to/your/step3 --tensor_parallel_size=2 --enable_expert_parallel --enforce_eager --afd-config '{"afd_connector":"p2pconnector", "num_afd_stages":"2", "afd_role": "ffn", "afd_host":"127.0.0.1", "afd_port":"29500", "afd_extra_config":{"afd_size":"2A2F"}}'Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.