-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue with loading FP8 quantized checkpoints by making the layer skipping logic more flexible. It now handles partial module prefixes in modules_to_not_convert
. The changes in vllm/model_executor/layers/quantization/utils/quant_utils.py
correctly implement this by using startswith
for prefix matching. However, I've identified a potential edge case where an empty string in the ignored_layers
list could lead to all layers being incorrectly skipped from quantization. I've provided suggestions to make the implementation more robust against this scenario.
is_shard_skipped = shard_prefix in ignored_layers or any( | ||
shard_prefix.startswith(ignored_layer) | ||
for ignored_layer in ignored_layers | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential issue here if ignored_layers
contains an empty string. shard_prefix.startswith('')
is always True
, which would cause is_shard_skipped
to be True
for all layers, effectively disabling quantization for all fused layers. This could lead to silent failures and incorrect model behavior. It's safer to filter out empty strings from ignored_layers
before checking startswith
.
is_shard_skipped = shard_prefix in ignored_layers or any( | |
shard_prefix.startswith(ignored_layer) | |
for ignored_layer in ignored_layers | |
) | |
is_shard_skipped = shard_prefix in ignored_layers or any( | |
shard_prefix.startswith(ignored_layer) | |
for ignored_layer in ignored_layers if ignored_layer | |
) |
is_skipped = prefix in ignored_layers or any( | ||
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the change for fused layers, this logic is vulnerable to an empty string in ignored_layers
. If ignored_layers
contains ''
, prefix.startswith('')
will always be True
, causing all layers to be skipped from quantization. This can be prevented by ensuring ignored_layer
is not an empty string before calling startswith
.
is_skipped = prefix in ignored_layers or any( | |
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers | |
) | |
is_skipped = prefix in ignored_layers or any( | |
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers if ignored_layer | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
else: | ||
is_skipped = prefix in ignored_layers | ||
is_skipped = prefix in ignored_layers or any( | ||
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle partial module prefixes anywhere in layer path
The new startswith
checks still only match when the ignored layer string is at the very beginning of the module name. In the case described in the commit message (e.g. modules_to_not_convert
contains "lm_head"
but the actual prefix seen by quantization is "llm.lm_head"
), prefix.startswith("lm_head")
returns False
, so the layer is quantized even though it was supposed to be skipped. As a result ms‑swift fp8 checkpoints with truncated prefixes will still attempt to quantize modules like llm.lm_head
and fail to load correctly. The comparison needs to match ignored entries anywhere in the qualified name (e.g. with in
/endswith
) rather than only at the start.
Useful? React with 👍 / 👎.
@Isotr0py @mgoin @robertgshaw2-redhat What is the status of this PR |
Purpose
ms-swift
will havemodules_to_not_convert
field with only partial module prefix:is_layer_skipped
used by fp8 to fit this caseTest Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.