Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Oct 6, 2025

Purpose

    "modules_to_not_convert": [
        "visual_tokenizer.vit",
        "vte",
        "visual_tokenizer.head",
        "lm_head"
    ],
  • This PR makes is_layer_skipped used by fp8 to fit this case

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with loading FP8 quantized checkpoints by making the layer skipping logic more flexible. It now handles partial module prefixes in modules_to_not_convert. The changes in vllm/model_executor/layers/quantization/utils/quant_utils.py correctly implement this by using startswith for prefix matching. However, I've identified a potential edge case where an empty string in the ignored_layers list could lead to all layers being incorrectly skipped from quantization. I've provided suggestions to make the implementation more robust against this scenario.

Comment on lines +305 to +308
is_shard_skipped = shard_prefix in ignored_layers or any(
shard_prefix.startswith(ignored_layer)
for ignored_layer in ignored_layers
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential issue here if ignored_layers contains an empty string. shard_prefix.startswith('') is always True, which would cause is_shard_skipped to be True for all layers, effectively disabling quantization for all fused layers. This could lead to silent failures and incorrect model behavior. It's safer to filter out empty strings from ignored_layers before checking startswith.

Suggested change
is_shard_skipped = shard_prefix in ignored_layers or any(
shard_prefix.startswith(ignored_layer)
for ignored_layer in ignored_layers
)
is_shard_skipped = shard_prefix in ignored_layers or any(
shard_prefix.startswith(ignored_layer)
for ignored_layer in ignored_layers if ignored_layer
)

Comment on lines +327 to +329
is_skipped = prefix in ignored_layers or any(
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the change for fused layers, this logic is vulnerable to an empty string in ignored_layers. If ignored_layers contains '', prefix.startswith('') will always be True, causing all layers to be skipped from quantization. This can be prevented by ensuring ignored_layer is not an empty string before calling startswith.

Suggested change
is_skipped = prefix in ignored_layers or any(
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers
)
is_skipped = prefix in ignored_layers or any(
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers if ignored_layer
)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 326 to +329
else:
is_skipped = prefix in ignored_layers
is_skipped = prefix in ignored_layers or any(
prefix.startswith(ignored_layer) for ignored_layer in ignored_layers
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle partial module prefixes anywhere in layer path

The new startswith checks still only match when the ignored layer string is at the very beginning of the module name. In the case described in the commit message (e.g. modules_to_not_convert contains "lm_head" but the actual prefix seen by quantization is "llm.lm_head"), prefix.startswith("lm_head") returns False, so the layer is quantized even though it was supposed to be skipped. As a result ms‑swift fp8 checkpoints with truncated prefixes will still attempt to quantize modules like llm.lm_head and fail to load correctly. The comparison needs to match ignored entries anywhere in the qualified name (e.g. with in/endswith) rather than only at the start.

Useful? React with 👍 / 👎.

@Dineshkumar-Anandan-ZS0367

@Isotr0py @mgoin @robertgshaw2-redhat

What is the status of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Quantization using swift for Ovis2.5 9B
2 participants