[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294

Isotr0py · 2025-10-06T12:54:51Z

Purpose

Fix [Bug]: Quantization using swift for Ovis2.5 9B #26182
VLMs quantized by ms-swift will have modules_to_not_convert field with only partial module prefix:

    "modules_to_not_convert": [
        "visual_tokenizer.vit",
        "vte",
        "visual_tokenizer.head",
        "lm_head"
    ],

This PR makes is_layer_skipped used by fp8 to fit this case

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <[email protected]>

gemini-code-assist

Code Review

This pull request fixes an issue with loading FP8 quantized checkpoints by making the layer skipping logic more flexible. It now handles partial module prefixes in modules_to_not_convert. The changes in vllm/model_executor/layers/quantization/utils/quant_utils.py correctly implement this by using startswith for prefix matching. However, I've identified a potential edge case where an empty string in the ignored_layers list could lead to all layers being incorrectly skipped from quantization. I've provided suggestions to make the implementation more robust against this scenario.

gemini-code-assist · 2025-10-06T12:56:22Z

vllm/model_executor/layers/quantization/utils/quant_utils.py

+            is_shard_skipped = shard_prefix in ignored_layers or any(
+                shard_prefix.startswith(ignored_layer)
+                for ignored_layer in ignored_layers
+            )


There's a potential issue here if ignored_layers contains an empty string. shard_prefix.startswith('') is always True, which would cause is_shard_skipped to be True for all layers, effectively disabling quantization for all fused layers. This could lead to silent failures and incorrect model behavior. It's safer to filter out empty strings from ignored_layers before checking startswith.

Suggested change

is_shard_skipped = shard_prefix in ignored_layers or any(

shard_prefix.startswith(ignored_layer)

for ignored_layer in ignored_layers

)

is_shard_skipped = shard_prefix in ignored_layers or any(

shard_prefix.startswith(ignored_layer)

for ignored_layer in ignored_layers if ignored_layer

)

gemini-code-assist · 2025-10-06T12:56:22Z

vllm/model_executor/layers/quantization/utils/quant_utils.py

+        is_skipped = prefix in ignored_layers or any(
+            prefix.startswith(ignored_layer) for ignored_layer in ignored_layers
+        )


Similar to the change for fused layers, this logic is vulnerable to an empty string in ignored_layers. If ignored_layers contains '', prefix.startswith('') will always be True, causing all layers to be skipped from quantization. This can be prevented by ensuring ignored_layer is not an empty string before calling startswith.

Suggested change

is_skipped = prefix in ignored_layers or any(

prefix.startswith(ignored_layer) for ignored_layer in ignored_layers

)

is_skipped = prefix in ignored_layers or any(

prefix.startswith(ignored_layer) for ignored_layer in ignored_layers if ignored_layer

)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-06T12:57:00Z

vllm/model_executor/layers/quantization/utils/quant_utils.py

    else:
-        is_skipped = prefix in ignored_layers
+        is_skipped = prefix in ignored_layers or any(
+            prefix.startswith(ignored_layer) for ignored_layer in ignored_layers
+        )


Handle partial module prefixes anywhere in layer path

The new startswith checks still only match when the ignored layer string is at the very beginning of the module name. In the case described in the commit message (e.g. modules_to_not_convert contains "lm_head" but the actual prefix seen by quantization is "llm.lm_head"), prefix.startswith("lm_head") returns False, so the layer is quantized even though it was supposed to be skipped. As a result ms‑swift fp8 checkpoints with truncated prefixes will still attempt to quantize modules like llm.lm_head and fail to load correctly. The comparison needs to match ignored entries anywhere in the qualified name (e.g. with in/endswith) rather than only at the start.

Useful? React with 👍 / 👎.

Dineshkumar-Anandan-ZS0367 · 2025-10-06T15:00:36Z

@Isotr0py @mgoin @robertgshaw2-redhat

What is the status of this PR

Isotr0py added 2 commits October 6, 2025 20:49

fix ovis2.5 fp8

b06c4f6

Signed-off-by: Isotr0py <[email protected]>

clean

884ba2e

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 6, 2025 12:54

Isotr0py mentioned this pull request Oct 6, 2025

[Bug]: Quantization using swift for Ovis2.5 9B #26182

Open

1 task

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294

[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294

Isotr0py commented Oct 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

gemini-code-assist bot Oct 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Uh oh!

Dineshkumar-Anandan-ZS0367 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294

Are you sure you want to change the base?

[Bugfix] Fix ovis2.5 pre-quant fp8 checkpoint loading #26294

Conversation

Isotr0py commented Oct 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Dineshkumar-Anandan-ZS0367 commented Oct 6, 2025

Uh oh!

Uh oh!

Isotr0py commented Oct 6, 2025 •

edited by github-actions bot

Loading