[Refactor] Use forward mapping instead of reverse mapping in AscendModelSlimConfig#7596
[Refactor] Use forward mapping instead of reverse mapping in AscendModelSlimConfig#7596Feng-xiaosuo wants to merge 17 commits intovllm-project:mainfrom
Conversation
…he quantization layer name Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com> Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Matrix_K <zhangke144@huawei.com>
Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com> Signed-off-by: Matrix_K <zhangke144@huawei.com>
Co-authored-by: Wang Kunpeng <1289706727@qq.com> Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com>
Signed-off-by: Feng-xiaosuo <tengchang1@huawei.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a nice refactoring of AscendModelSlimConfig that simplifies the quantization key mapping logic. It correctly switches from a complex reverse mapping approach to a simpler forward mapping, which is applied once to transform the quantization description keys. This eliminates a significant amount of special-cased mapping code and makes the implementation cleaner and less error-prone.
I've found one potential issue in the new implementation of apply_vllm_mapper where calling it with different mappers could lead to incorrect behavior due to in-place modification of the configuration. I've suggested a change to make this more robust.
| if self._mapper_applied and self.hf_to_vllm_mapper is hf_to_vllm_mapper: | ||
| return |
There was a problem hiding this comment.
The current logic for checking if the mapper has been applied is not fully robust. If apply_vllm_mapper is called a second time with a different mapper instance, the condition self.hf_to_vllm_mapper is hf_to_vllm_mapper will be false. The code would then proceed to apply the new mapping on the quant_description keys which have already been transformed, leading to incorrect behavior.
Since apply_dict modifies the quant_description state, re-applying a mapping is not safe. To prevent this potential bug, you should add a more robust check to error out if a different mapper is provided after the first application.
| if self._mapper_applied and self.hf_to_vllm_mapper is hf_to_vllm_mapper: | |
| return | |
| if self._mapper_applied: | |
| if self.hf_to_vllm_mapper is not hf_to_vllm_mapper: | |
| raise RuntimeError( | |
| "apply_vllm_mapper() has already been called with a different " | |
| "mapper. Re-applying the mapping is not supported as it " | |
| "modifies the quantization description in-place." | |
| ) | |
| return |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
f01eab1 to
5199150
Compare
Signed-off-by: Matrix_K <zhangke144@huawei.com>
What this PR does / why we need it?
This PR refactors the
AscendModelSlimConfigclass to use forward mapping instead of reverse mapping for quantization config key transformation.Changes:
apply_vllm_mapper()to directly applyhf_to_vllm_mapper.apply_dict()to transformquant_descriptionkeys from HF format to vLLM formatquant_prefix_mapper()to return the prefix directly (no longer needs mapping since keys are already in vLLM format)QUANT_MODEL_PREFIX_MAPPINGSdictionary (~50 lines) - no longer neededget_prefix_mapping()function - no longer neededvllm_to_hf_mapperattribute - no longer neededWhy this change is needed:
The previous implementation used reverse mapping (vLLM → HF) which had several issues:
QUANT_MODEL_PREFIX_MAPPINGSdict that duplicated information already available in vLLM's model-specificWeightsMapperThe new approach:
WeightsMappercompressed_tensors_config.pyhandles the same scenarioDoes this PR introduce any user-facing change?
No. This is an internal refactoring that does not change any user-facing API or behavior.
How was this patch tested?