[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355

yueshen2016 · 2025-09-22T22:43:42Z

Fix bugs for unknown model_type of llava and fix path dependency for vila

What does this PR do?

Type of change: ?
Bug fix

Overview: ?
This PR fixes two NVBugs: https://nvbugs/5528642 and https://nvbugs/5528695

Usage

# Add a code snippet demonstrating how to use this

Testing

bash -e scripts/huggingface_example.sh \
  --model /models/llava-1.5-7b-hf \
  --quant int8_sq \
  --tp 1 --pp 1 \
  --trust_remote_code \
  --kv_cache_free_gpu_memory_fraction 0.5

CUDA_VISIBLE_DEVICES=0 bash -e scripts/huggingface_example.sh \
  --model /models/VILA1.5-3b \
  --quant w4a8_awq \
  --tp 1 --pp 1 \
  --trust_remote_code \
  --kv_cache_free_gpu_memory_fraction 0.5

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

…vila Signed-off-by: Yue <[email protected]>

coderabbitai · 2025-09-22T22:43:50Z

Walkthrough

The PR updates model type handling in examples/llm_ptq/hf_ptq.py by recalculating model_type from the moved submodel. It also modifies examples/vlm_ptq/scripts/huggingface_example.sh to detect "vila" models via a case-insensitive substring match on MODEL_NAME, adjusting dependency installation and repo cloning conditions.

Changes

Cohort / File(s)	Summary of changes
LLM PTQ model type recalculation `examples/llm_ptq/hf_ptq.py`	Recomputes model_type using get_model_type(model) after moving to model.language_model to ensure subsequent conditionals reference the updated submodel context.
VLM script model detection update `examples/vlm_ptq/scripts/huggingface_example.sh`	Changes Vila detection from exact MODEL_TYPE match to case-insensitive substring search on MODEL_NAME; affects conditional block for dependency installation and Vila repo cloning.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Script as hf_ptq.py
  participant Model
  participant SubModel as model.language_model

  User->>Script: Run PTQ example
  Script->>Model: Load model
  Script->>SubModel: Access language_model
  Note right of Script: Recompute model_type = get_model_type(SubModel)
  Script->>Script: Branch on updated model_type
  Script-->>User: Proceed with PTQ steps based on submodel type

sequenceDiagram
  autonumber
  actor User
  participant SH as huggingface_example.sh
  participant Env as MODEL_NAME/MODEL_TYPE

  User->>SH: Execute script with model params
  SH->>Env: Read MODEL_NAME
  Note right of SH: Check if lower(MODEL_NAME) contains "vila"
  alt MODEL_NAME contains "vila"
    SH->>SH: Install Vila deps & clone repo
  else Not Vila
    SH->>SH: Skip Vila-specific steps
  end
  SH-->>User: Continue with remaining setup

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A twitch of whiskers, swift and bright,
I tweak the flow to set it right—
Model names I sniff with care,
“vila” scents are everywhere!
Submodels named, types aligned,
Hop-hop! The scripts are better timed.
Carrots queued, commits well-signed. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title cites the two NVBug numbers and identifies the change as a VLM bug fix, which aligns with the PR objectives addressing model_type handling for llava and a path dependency for vila; it therefore reflects the main change in the changeset. It is concise and focused on the primary intent rather than listing files or unrelated details. While somewhat generic, it is clear enough for a teammate scanning history to understand this PR fixes VLM-related bugs.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yue/vlm_bug_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/vlm_ptq/scripts/huggingface_example.sh (1)

76-87: Broader VILA detection is good; fix clone location for HF repo IDs.

Substring check on MODEL_NAME works, but cloning to "$(dirname "$MODEL_PATH")/VILA" fails when MODEL_PATH is an HF repo id (no local parent dir). Prefer a safe base (ROOT_SAVE_PATH) when dirname doesn’t exist, avoid cwd side‑effects, and export PYTHONPATH so imports resolve.

Apply this diff:

-if [[ "${MODEL_NAME,,}" == *"vila"* ]]; then
-    # Install required dependency for VILA
-    pip install -r ../vlm_ptq/requirements-vila.txt
-    # Clone original VILA repo
-    if [ ! -d "$(dirname "$MODEL_PATH")/VILA" ]; then
-        echo "VILA repository is needed until it is added to HF model zoo. Cloning the repository parallel to $MODEL_PATH..."
-        git clone https://github.com/Efficient-Large-Model/VILA.git "$(dirname "$MODEL_PATH")/VILA" && \
-	cd "$(dirname "$MODEL_PATH")/VILA" && \
-	git checkout ec7fb2c264920bf004fd9fa37f1ec36ea0942db5 && \
-	cd "$script_dir/.."
-    fi
-fi
+if [[ "${MODEL_NAME,,}" == *"vila"* ]]; then
+    # Install required dependency for VILA
+    python3 -m pip install -r ../vlm_ptq/requirements-vila.txt
+    # Clone original VILA repo next to local model path if it exists, otherwise under ROOT_SAVE_PATH.
+    if [ -d "$(dirname "$MODEL_PATH")" ]; then
+        CLONE_BASE="$(dirname "$MODEL_PATH")"
+    else
+        CLONE_BASE="$ROOT_SAVE_PATH"
+    fi
+    VILA_DIR="$CLONE_BASE/VILA"
+    if [ ! -d "$VILA_DIR/.git" ]; then
+        echo "VILA repository is needed until it is added to HF model zoo. Cloning into $VILA_DIR ..."
+        git clone https://github.com/Efficient-Large-Model/VILA.git "$VILA_DIR"
+        ( cd "$VILA_DIR" && git checkout ec7fb2c264920bf004fd9fa37f1ec36ea0942db5 )
+    fi
+    export PYTHONPATH="$VILA_DIR:${PYTHONPATH:-}"
+fi

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0590b0 and 42bb023.

📒 Files selected for processing (2)

examples/llm_ptq/hf_ptq.py (1 hunks)
examples/vlm_ptq/scripts/huggingface_example.sh (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

examples/llm_ptq/hf_ptq.py (2)

modelopt/torch/export/model_utils.py (1)

get_model_type (66-71)

modelopt/onnx/llm_export_utils/export_utils.py (1)

get_model_type (47-50)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

examples/vlm_ptq/scripts/huggingface_example.sh (1)

76-79: LGTM — case-insensitive substring on MODEL_NAME verified.
Test script matches VILA1.5-3b, vila, fooVILAbar and not other.

coderabbitai · 2025-09-22T22:48:40Z

examples/llm_ptq/hf_ptq.py

            model = model.language_model
+            model_type = get_model_type(model)

    if args.sparsity_fmt != "dense":


⚠️ Potential issue

Recompute quantized-state (and device) after swapping to submodel.

Great call to refresh model_type on the language submodel. However, gating later on model_is_already_quantized (computed before the swap) can now be wrong for VLMs whose container is quantized but language_model is not. Also refresh device in case the submodule is on a different device.

Apply this diff:

model = model.language_model - model_type = get_model_type(model) + model_type = get_model_type(model) + # Keep subsequent logic consistent with the sub‑model we actually operate on. + model_is_already_quantized = is_quantized(model) + device = getattr(model, "device", device)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model = model.language_model

model_type = get_model_type(model)

if args.sparsity_fmt != "dense":

model = model.language_model

model_type = get_model_type(model)

# Keep subsequent logic consistent with the sub‑model we actually operate on.

model_is_already_quantized = is_quantized(model)

device = getattr(model, "device", device)

if args.sparsity_fmt != "dense":

🤖 Prompt for AI Agents

In examples/llm_ptq/hf_ptq.py around lines 318 to 321, after swapping to the language submodel (model = model.language_model) and refreshing model_type, also recompute model_is_already_quantized and device there so they reflect the submodel state (the container may be quantized while language_model is not, or the submodule may be on a different device). Update the code to move or duplicate the logic that sets model_is_already_quantized and device to immediately follow the swap and model_type refresh, ensuring subsequent gates (e.g., the args.sparsity_fmt != "dense" branch) use the corrected values.

codecov · 2025-09-22T22:56:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.83%. Comparing base (74061f5) to head (42bb023).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #355      +/-   ##
==========================================
- Coverage   73.84%   73.83%   -0.01%     
==========================================
  Files         172      172              
  Lines       17453    17453              
==========================================
- Hits        12888    12887       -1     
- Misses       4565     4566       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2025-09-23T00:11:06Z

examples/llm_ptq/hf_ptq.py

                    mtq.quantize(child, disabled_quant_cfg, forward_loop=None)

            model = model.language_model
+            model_type = get_model_type(model)


do we still need this model_type?

Only for int8_sq format, as later it will be exported to tensorrt_llm ckpt. Without line, the model_type will be unknown, as this nvbug shows.

Signed-off-by: Yue <[email protected]>

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

fix bugs for unknown model_type of llava and fix path dependency for …

42bb023

…vila Signed-off-by: Yue <[email protected]>

yueshen2016 requested review from cjluo-nv, kevalmorabia97 and meenchen September 22, 2025 22:43

yueshen2016 requested review from a team as code owners September 22, 2025 22:43

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

cjluo-nv reviewed Sep 23, 2025

View reviewed changes

yueshen2016 requested a review from cjluo-nv September 23, 2025 05:31

cjluo-nv approved these changes Sep 23, 2025

View reviewed changes

meenchen approved these changes Sep 23, 2025

View reviewed changes

yueshen2016 merged commit 7d5f636 into main Sep 23, 2025
27 checks passed

yueshen2016 deleted the yue/vlm_bug_fix branch September 23, 2025 16:03

kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix (#355)

8a14127

Signed-off-by: Yue <[email protected]>

kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix (#355)

e59ef52

Signed-off-by: Yue <[email protected]>

yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix (#355)

ed06e20

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355

Uh oh!

yueshen2016 commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 22, 2025

Uh oh!

codecov bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

cjluo-nv Sep 23, 2025

Uh oh!

yueshen2016 Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355

[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355

Uh oh!

Conversation

yueshen2016 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

yueshen2016 Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yueshen2016 commented Sep 22, 2025 •

edited

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

codecov bot commented Sep 22, 2025 •

edited

Loading

yueshen2016 Sep 23, 2025 •

edited

Loading