Skip to content

Conversation

realAsma
Copy link
Contributor

@realAsma realAsma commented Sep 25, 2025

What does this PR do?

Type of change: ? Bug fix

Overview: ?

Usage

NA

Testing

docker: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
I can run the following:

python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-1.7B \
    --export_path test --qformat fp8 \
    --calib_size 16 --trust_remote_code --kv_cache_qformat fp8 --use_seq_device_map

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability of model initialization via auto-configuration across architectures by standardizing how memory constraints are handled. This prevents failures caused by incompatible memory parameters during setup, resulting in smoother loading and fewer edge-case initialization errors for non-VILA models. No configuration changes required; existing workflows should work more consistently and more predictable performance.

@realAsma realAsma requested a review from a team as a code owner September 25, 2025 16:47
@realAsma realAsma requested a review from sugunav14 September 25, 2025 16:47
Copy link

coderabbitai bot commented Sep 25, 2025

Walkthrough

The get_model function in examples/llm_ptq/example_utils.py now always removes max_memory from model_kwargs2 before constructing the model from a config in the non-VILA path, instead of doing so only for DeciLMForCausalLM architectures.

Changes

Cohort / File(s) Summary
LLM PTQ example utils
examples/llm_ptq/example_utils.py
In get_model, unconditionally pop max_memory from model_kwargs2 prior to from_config in the non-VILA branch, removing the prior architecture-specific (DeciLM) check.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Utils as example_utils.get_model
  participant Cfg as AutoConfig
  participant Model as AutoModelForCausalLM

  rect rgb(240,248,255)
    note over Utils: New flow (non-VILA)
    Caller->>Utils: get_model(...)
    Utils->>Utils: pop max_memory from model_kwargs2
    Utils->>Cfg: from_pretrained(...) / build config
    Utils->>Model: from_config(cfg, **model_kwargs2)
    Model-->>Caller: model
  end
Loading
sequenceDiagram
  autonumber
  actor Caller
  participant Utils as example_utils.get_model
  participant Cfg as AutoConfig
  participant Model as AutoModelForCausalLM

  rect rgb(255,250,240)
    note over Utils: Previous flow (non-VILA)
    Caller->>Utils: get_model(...)
    alt arch is DeciLMForCausalLM
      Utils->>Utils: pop max_memory from model_kwargs2
    else other arch
      Utils->>Utils: keep max_memory in model_kwargs2
    end
    Utils->>Cfg: from_pretrained(...) / build config
    Utils->>Model: from_config(cfg, **model_kwargs2)
    Model-->>Caller: model
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nudge a flag from the model’s pack,
A gentle hop—no turning back.
Max memory slips, the config calls,
One true path through hidden halls.
Ears perked high, I ship with glee—
Clean little change, swift as a bunny. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the primary change in the pull request by indicating that max_memory is being removed from the from_config initialization in the hf_ptq codebase as a bug fix. It directly reflects the core modification without extraneous information.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch asma/fix_nvbug_5532091

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

# DeciLMForCausalLM does not support max_memory argument
if "architectures" in hf_config and "DeciLMForCausalLM" in hf_config.architectures:
model_kwargs2.pop("max_memory", None)
model_kwargs2.pop("max_memory", None)
Copy link
Collaborator

@cjluo-nv cjluo-nv Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you actually want to remove line 166-171 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure I also thought of it. But I do not know why we added the use_seq_device_map in the first place. So should I remove the entire --use_seq_device_map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recalled that --use_seq_device_map was added by @sugunav14 for nemotron models that do not distribute evenly on GPUs when initializing with auto device_map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjluo-nv What should we do here? The current fix works. If you are okay with the current fix, can you please approve it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_memory is set at line 170. If we don't use it here, we probably don't need line 170. @sugunav14 could you chime in?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, seems like with this change max_memory is never used.

  1. If we get rid of --use_seq_device_map I think the Llama-Nemotron Ultra will throw an OOM during export
  2. If we only get rid of max memory I believe we will still hit OOM if we use --use_seq_device_map cause the model will load consuming the full GPU memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjluo-nv @sugunav14 removing max_memory as in this PR will not cause an OOM. max_memory is removed when loading with AutoModelForCausalLM.from_config under init_empy_weights context. Looks like the fix in this PR is sufficient. Can you please approve this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sugunav14 The model initialized here uses AutoModel.from_config(). AutoModel.from_config no longer accepts extra kwargs such as max_memory (I tested with Llama3 and Qwen3).

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (0178562) to head (d2fe46f).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #373   +/-   ##
=======================================
  Coverage   73.46%   73.46%           
=======================================
  Files         172      172           
  Lines       17640    17640           
=======================================
  Hits        12959    12959           
  Misses       4681     4681           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma changed the title [Bug fix] remove deom init in hf_ptq [Bug fix] remove max_memory from from_config init in hf_ptq Sep 26, 2025
# DeciLMForCausalLM does not support max_memory argument
if "architectures" in hf_config and "DeciLMForCausalLM" in hf_config.architectures:
model_kwargs2.pop("max_memory", None)
model_kwargs2.pop("max_memory", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we inspect model's signature and keep kwargs if they are supported to avoid such issues in the future?

Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @sugunav14 offline we still need the lines above

@sugunav14 sugunav14 merged commit d8ef6f8 into main Sep 30, 2025
35 of 38 checks passed
@sugunav14 sugunav14 deleted the asma/fix_nvbug_5532091 branch September 30, 2025 16:13
kevalmorabia97 pushed a commit that referenced this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants