[Bug fix] remove max_memory from from_config init in hf_ptq #373

realAsma · 2025-09-25T16:47:46Z

What does this PR do?

Type of change: ? Bug fix

Overview: ?

Usage

NA

Testing

docker: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
I can run the following:

python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-1.7B \
    --export_path test --qformat fp8 \
    --calib_size 16 --trust_remote_code --kv_cache_qformat fp8 --use_seq_device_map

Summary by CodeRabbit

Bug Fixes
- Improved reliability of model initialization via auto-configuration across architectures by standardizing how memory constraints are handled. This prevents failures caused by incompatible memory parameters during setup, resulting in smoother loading and fewer edge-case initialization errors for non-VILA models. No configuration changes required; existing workflows should work more consistently and more predictable performance.

Signed-off-by: realAsma <[email protected]>

coderabbitai · 2025-09-25T16:47:54Z

Walkthrough

The get_model function in examples/llm_ptq/example_utils.py now always removes max_memory from model_kwargs2 before constructing the model from a config in the non-VILA path, instead of doing so only for DeciLMForCausalLM architectures.

Changes

Cohort / File(s)	Summary
LLM PTQ example utils `examples/llm_ptq/example_utils.py`	In get_model, unconditionally pop `max_memory` from `model_kwargs2` prior to `from_config` in the non-VILA branch, removing the prior architecture-specific (DeciLM) check.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Utils as example_utils.get_model
  participant Cfg as AutoConfig
  participant Model as AutoModelForCausalLM

  rect rgb(240,248,255)
    note over Utils: New flow (non-VILA)
    Caller->>Utils: get_model(...)
    Utils->>Utils: pop max_memory from model_kwargs2
    Utils->>Cfg: from_pretrained(...) / build config
    Utils->>Model: from_config(cfg, **model_kwargs2)
    Model-->>Caller: model
  end

sequenceDiagram
  autonumber
  actor Caller
  participant Utils as example_utils.get_model
  participant Cfg as AutoConfig
  participant Model as AutoModelForCausalLM

  rect rgb(255,250,240)
    note over Utils: Previous flow (non-VILA)
    Caller->>Utils: get_model(...)
    alt arch is DeciLMForCausalLM
      Utils->>Utils: pop max_memory from model_kwargs2
    else other arch
      Utils->>Utils: keep max_memory in model_kwargs2
    end
    Utils->>Cfg: from_pretrained(...) / build config
    Utils->>Model: from_config(cfg, **model_kwargs2)
    Model-->>Caller: model
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nudge a flag from the model’s pack,
A gentle hop—no turning back.
Max memory slips, the config calls,
One true path through hidden halls.
Ears perked high, I ship with glee—
Clean little change, swift as a bunny. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely describes the primary change in the pull request by indicating that max_memory is being removed from the from_config initialization in the hf_ptq codebase as a bug fix. It directly reflects the core modification without extraneous information.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch asma/fix_nvbug_5532091

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cjluo-nv · 2025-09-25T16:53:16Z

examples/llm_ptq/example_utils.py

-                # DeciLMForCausalLM does not support max_memory argument
-                if "architectures" in hf_config and "DeciLMForCausalLM" in hf_config.architectures:
-                    model_kwargs2.pop("max_memory", None)
+                model_kwargs2.pop("max_memory", None)


do you actually want to remove line 166-171 as well?

yeah sure I also thought of it. But I do not know why we added the use_seq_device_map in the first place. So should I remove the entire --use_seq_device_map?

I recalled that --use_seq_device_map was added by @sugunav14 for nemotron models that do not distribute evenly on GPUs when initializing with auto device_map.

@cjluo-nv What should we do here? The current fix works. If you are okay with the current fix, can you please approve it?

max_memory is set at line 170. If we don't use it here, we probably don't need line 170. @sugunav14 could you chime in?

yeah, seems like with this change max_memory is never used.

If we get rid of --use_seq_device_map I think the Llama-Nemotron Ultra will throw an OOM during export

If we only get rid of max memory I believe we will still hit OOM if we use --use_seq_device_map cause the model will load consuming the full GPU memory

@cjluo-nv @sugunav14 removing max_memory as in this PR will not cause an OOM. max_memory is removed when loading with AutoModelForCausalLM.from_config under init_empy_weights context. Looks like the fix in this PR is sufficient. Can you please approve this?

@sugunav14 The model initialized here uses AutoModel.from_config(). AutoModel.from_config no longer accepts extra kwargs such as max_memory (I tested with Llama3 and Qwen3).

codecov · 2025-09-25T17:00:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (0178562) to head (d2fe46f).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #373   +/-   ##
=======================================
  Coverage   73.46%   73.46%           
=======================================
  Files         172      172           
  Lines       17640    17640           
=======================================
  Hits        12959    12959           
  Misses       4681     4681

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2025-09-26T17:09:40Z

examples/llm_ptq/example_utils.py

-                # DeciLMForCausalLM does not support max_memory argument
-                if "architectures" in hf_config and "DeciLMForCausalLM" in hf_config.architectures:
-                    model_kwargs2.pop("max_memory", None)
+                model_kwargs2.pop("max_memory", None)


How about we inspect model's signature and keep kwargs if they are supported to avoid such issues in the future?

cjluo-nv

Discussed with @sugunav14 offline we still need the lines above

Signed-off-by: realAsma <[email protected]>

[Bug fix] remove deom init in hf_ptq

d2fe46f

Signed-off-by: realAsma <[email protected]>

realAsma requested a review from a team as a code owner September 25, 2025 16:47

realAsma requested a review from sugunav14 September 25, 2025 16:47

realAsma requested review from cjluo-nv and meenchen September 25, 2025 16:47

cjluo-nv reviewed Sep 25, 2025

View reviewed changes

realAsma changed the title ~~[Bug fix] remove deom init in hf_ptq~~ [Bug fix] remove max_memory from from_config init in hf_ptq Sep 26, 2025

kevalmorabia97 reviewed Sep 26, 2025

View reviewed changes

realAsma requested review from cjluo-nv and kevalmorabia97 September 26, 2025 21:00

cjluo-nv approved these changes Sep 29, 2025

View reviewed changes

sugunav14 merged commit d8ef6f8 into main Sep 30, 2025
35 of 38 checks passed

sugunav14 deleted the asma/fix_nvbug_5532091 branch September 30, 2025 16:13

kevalmorabia97 pushed a commit that referenced this pull request Oct 1, 2025

[Bug fix] remove max_memory from from_config init in hf_ptq (#373)

71edf4d

Signed-off-by: realAsma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug fix] remove max_memory from from_config init in hf_ptq #373

[Bug fix] remove max_memory from from_config init in hf_ptq #373

Uh oh!

realAsma commented Sep 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

cjluo-nv Sep 25, 2025 •

edited

Loading

Uh oh!

realAsma Sep 25, 2025

Uh oh!

meenchen Sep 25, 2025

Uh oh!

realAsma Sep 25, 2025

Uh oh!

cjluo-nv Sep 26, 2025

Uh oh!

sugunav14 Sep 26, 2025

Uh oh!

realAsma Sep 26, 2025

Uh oh!

realAsma Sep 29, 2025

Uh oh!

codecov bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

kevalmorabia97 Sep 26, 2025

Uh oh!

cjluo-nv left a comment

Uh oh!

Uh oh!

Uh oh!

[Bug fix] remove max_memory from from_config init in hf_ptq #373

[Bug fix] remove max_memory from from_config init in hf_ptq #373

Uh oh!

Conversation

realAsma commented Sep 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

cjluo-nv Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

realAsma commented Sep 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 25, 2025 •

edited

Loading

cjluo-nv Sep 25, 2025 •

edited

Loading

codecov bot commented Sep 25, 2025 •

edited

Loading