-
Notifications
You must be signed in to change notification settings - Fork 169
[Bug fix 5528642] [Bug fix 5528695] VLM NVBug fix #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -316,6 +316,7 @@ def main(args): | |||||||||||||||||||||||
mtq.quantize(child, disabled_quant_cfg, forward_loop=None) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
model = model.language_model | ||||||||||||||||||||||||
model_type = get_model_type(model) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
if args.sparsity_fmt != "dense": | ||||||||||||||||||||||||
Comment on lines
318
to
321
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recompute quantized-state (and device) after swapping to submodel. Great call to refresh model_type on the language submodel. However, gating later on model_is_already_quantized (computed before the swap) can now be wrong for VLMs whose container is quantized but language_model is not. Also refresh device in case the submodule is on a different device. Apply this diff: model = model.language_model
- model_type = get_model_type(model)
+ model_type = get_model_type(model)
+ # Keep subsequent logic consistent with the sub‑model we actually operate on.
+ model_is_already_quantized = is_quantized(model)
+ device = getattr(model, "device", device) 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
|
||||||||||||||||||||||||
if args.batch_size == 0: | ||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this model_type?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for int8_sq format, as later it will be exported to tensorrt_llm ckpt. Without line, the model_type will be unknown, as this nvbug shows.