Skip to content

Conversation

yueshen2016
Copy link
Contributor

@yueshen2016 yueshen2016 commented Sep 22, 2025

Fix bugs for unknown model_type of llava and fix path dependency for vila

What does this PR do?

Type of change: ?
Bug fix

Overview: ?
This PR fixes two NVBugs: https://nvbugs/5528642 and https://nvbugs/5528695

Usage

# Add a code snippet demonstrating how to use this

Testing

Testing

bash -e scripts/huggingface_example.sh \
  --model /models/llava-1.5-7b-hf \
  --quant int8_sq \
  --tp 1 --pp 1 \
  --trust_remote_code \
  --kv_cache_free_gpu_memory_fraction 0.5

CUDA_VISIBLE_DEVICES=0 bash -e scripts/huggingface_example.sh \
  --model /models/VILA1.5-3b \
  --quant w4a8_awq \
  --tp 1 --pp 1 \
  --trust_remote_code \
  --kv_cache_free_gpu_memory_fraction 0.5

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Copy link

coderabbitai bot commented Sep 22, 2025

Walkthrough

The PR updates model type handling in examples/llm_ptq/hf_ptq.py by recalculating model_type from the moved submodel. It also modifies examples/vlm_ptq/scripts/huggingface_example.sh to detect "vila" models via a case-insensitive substring match on MODEL_NAME, adjusting dependency installation and repo cloning conditions.

Changes

Cohort / File(s) Summary of changes
LLM PTQ model type recalculation
examples/llm_ptq/hf_ptq.py
Recomputes model_type using get_model_type(model) after moving to model.language_model to ensure subsequent conditionals reference the updated submodel context.
VLM script model detection update
examples/vlm_ptq/scripts/huggingface_example.sh
Changes Vila detection from exact MODEL_TYPE match to case-insensitive substring search on MODEL_NAME; affects conditional block for dependency installation and Vila repo cloning.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Script as hf_ptq.py
  participant Model
  participant SubModel as model.language_model

  User->>Script: Run PTQ example
  Script->>Model: Load model
  Script->>SubModel: Access language_model
  Note right of Script: Recompute model_type = get_model_type(SubModel)
  Script->>Script: Branch on updated model_type
  Script-->>User: Proceed with PTQ steps based on submodel type
Loading
sequenceDiagram
  autonumber
  actor User
  participant SH as huggingface_example.sh
  participant Env as MODEL_NAME/MODEL_TYPE

  User->>SH: Execute script with model params
  SH->>Env: Read MODEL_NAME
  Note right of SH: Check if lower(MODEL_NAME) contains "vila"
  alt MODEL_NAME contains "vila"
    SH->>SH: Install Vila deps & clone repo
  else Not Vila
    SH->>SH: Skip Vila-specific steps
  end
  SH-->>User: Continue with remaining setup
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A twitch of whiskers, swift and bright,
I tweak the flow to set it right—
Model names I sniff with care,
“vila” scents are everywhere!
Submodels named, types aligned,
Hop-hop! The scripts are better timed.
Carrots queued, commits well-signed. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title cites the two NVBug numbers and identifies the change as a VLM bug fix, which aligns with the PR objectives addressing model_type handling for llava and a path dependency for vila; it therefore reflects the main change in the changeset. It is concise and focused on the primary intent rather than listing files or unrelated details. While somewhat generic, it is clear enough for a teammate scanning history to understand this PR fixes VLM-related bugs.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yue/vlm_bug_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/vlm_ptq/scripts/huggingface_example.sh (1)

76-87: Broader VILA detection is good; fix clone location for HF repo IDs.

Substring check on MODEL_NAME works, but cloning to "$(dirname "$MODEL_PATH")/VILA" fails when MODEL_PATH is an HF repo id (no local parent dir). Prefer a safe base (ROOT_SAVE_PATH) when dirname doesn’t exist, avoid cwd side‑effects, and export PYTHONPATH so imports resolve.

Apply this diff:

-if [[ "${MODEL_NAME,,}" == *"vila"* ]]; then
-    # Install required dependency for VILA
-    pip install -r ../vlm_ptq/requirements-vila.txt
-    # Clone original VILA repo
-    if [ ! -d "$(dirname "$MODEL_PATH")/VILA" ]; then
-        echo "VILA repository is needed until it is added to HF model zoo. Cloning the repository parallel to $MODEL_PATH..."
-        git clone https://github.com/Efficient-Large-Model/VILA.git "$(dirname "$MODEL_PATH")/VILA" && \
-	cd "$(dirname "$MODEL_PATH")/VILA" && \
-	git checkout ec7fb2c264920bf004fd9fa37f1ec36ea0942db5 && \
-	cd "$script_dir/.."
-    fi
-fi
+if [[ "${MODEL_NAME,,}" == *"vila"* ]]; then
+    # Install required dependency for VILA
+    python3 -m pip install -r ../vlm_ptq/requirements-vila.txt
+    # Clone original VILA repo next to local model path if it exists, otherwise under ROOT_SAVE_PATH.
+    if [ -d "$(dirname "$MODEL_PATH")" ]; then
+        CLONE_BASE="$(dirname "$MODEL_PATH")"
+    else
+        CLONE_BASE="$ROOT_SAVE_PATH"
+    fi
+    VILA_DIR="$CLONE_BASE/VILA"
+    if [ ! -d "$VILA_DIR/.git" ]; then
+        echo "VILA repository is needed until it is added to HF model zoo. Cloning into $VILA_DIR ..."
+        git clone https://github.com/Efficient-Large-Model/VILA.git "$VILA_DIR"
+        ( cd "$VILA_DIR" && git checkout ec7fb2c264920bf004fd9fa37f1ec36ea0942db5 )
+    fi
+    export PYTHONPATH="$VILA_DIR:${PYTHONPATH:-}"
+fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0590b0 and 42bb023.

📒 Files selected for processing (2)
  • examples/llm_ptq/hf_ptq.py (1 hunks)
  • examples/vlm_ptq/scripts/huggingface_example.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_ptq/hf_ptq.py (2)
modelopt/torch/export/model_utils.py (1)
  • get_model_type (66-71)
modelopt/onnx/llm_export_utils/export_utils.py (1)
  • get_model_type (47-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
examples/vlm_ptq/scripts/huggingface_example.sh (1)

76-79: LGTM — case-insensitive substring on MODEL_NAME verified.
Test script matches VILA1.5-3b, vila, fooVILAbar and not other.

Comment on lines 318 to 321
model = model.language_model
model_type = get_model_type(model)

if args.sparsity_fmt != "dense":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Recompute quantized-state (and device) after swapping to submodel.

Great call to refresh model_type on the language submodel. However, gating later on model_is_already_quantized (computed before the swap) can now be wrong for VLMs whose container is quantized but language_model is not. Also refresh device in case the submodule is on a different device.

Apply this diff:

             model = model.language_model
-            model_type = get_model_type(model)
+            model_type = get_model_type(model)
+            # Keep subsequent logic consistent with the sub‑model we actually operate on.
+            model_is_already_quantized = is_quantized(model)
+            device = getattr(model, "device", device)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model = model.language_model
model_type = get_model_type(model)
if args.sparsity_fmt != "dense":
model = model.language_model
model_type = get_model_type(model)
# Keep subsequent logic consistent with the sub‑model we actually operate on.
model_is_already_quantized = is_quantized(model)
device = getattr(model, "device", device)
if args.sparsity_fmt != "dense":
🤖 Prompt for AI Agents
In examples/llm_ptq/hf_ptq.py around lines 318 to 321, after swapping to the
language submodel (model = model.language_model) and refreshing model_type, also
recompute model_is_already_quantized and device there so they reflect the
submodel state (the container may be quantized while language_model is not, or
the submodule may be on a different device). Update the code to move or
duplicate the logic that sets model_is_already_quantized and device to
immediately follow the swap and model_type refresh, ensuring subsequent gates
(e.g., the args.sparsity_fmt != "dense" branch) use the corrected values.

Copy link

codecov bot commented Sep 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.83%. Comparing base (74061f5) to head (42bb023).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #355      +/-   ##
==========================================
- Coverage   73.84%   73.83%   -0.01%     
==========================================
  Files         172      172              
  Lines       17453    17453              
==========================================
- Hits        12888    12887       -1     
- Misses       4565     4566       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mtq.quantize(child, disabled_quant_cfg, forward_loop=None)

model = model.language_model
model_type = get_model_type(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this model_type?

Copy link
Contributor Author

@yueshen2016 yueshen2016 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for int8_sq format, as later it will be exported to tensorrt_llm ckpt. Without line, the model_type will be unknown, as this nvbug shows.

@yueshen2016 yueshen2016 merged commit 7d5f636 into main Sep 23, 2025
27 checks passed
@yueshen2016 yueshen2016 deleted the yue/vlm_bug_fix branch September 23, 2025 16:03
kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025
kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025
yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants