Skip to content

Conversation

cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Sep 24, 2025

What does this PR do?

Type of change: Refactor

Overview: ?

We no longer uses TRT-LLM engines in our examples. All references of the keyword engine are change to checkpoint

Summary by CodeRabbit

  • Documentation

    • Expanded deprecation note for the TensorRT backend and updated README headings; generalized VLM quantization description to reference supported multi‑modal models.
    • Updated example commands and docs to use checkpoint_dir instead of engine_dir.
  • Chores

    • Renamed CLI flag engine_dir → checkpoint_dir across evaluation, PTQ, and helper scripts.
    • Added new quantization-related script options and clarified usage text and log messages.
  • Refactor

    • Removed a deprecated helper and a bundled VLM CLI-argument utility to simplify interfaces.

@cjluo-nv cjluo-nv requested review from a team as code owners September 24, 2025 22:25
@cjluo-nv cjluo-nv requested a review from Edwardf0t1 September 24, 2025 22:25
Copy link

coderabbitai bot commented Sep 24, 2025

Walkthrough

Renames TensorRT‑LLM parameter/flag from engine_dir to checkpoint_dir across LLM eval and PTQ examples and scripts, updates docs and logs, adjusts tokenizer fallback to checkpoint_dir, removes two helper functions, expands CHANGELOG deprecation note, generalizes a VLM README paragraph, and adds quant-related flags to run_fastchat.sh.

Changes

Cohort / File(s) Summary of changes
Deprecation note
CHANGELOG.rst
Expanded TRT backend deprecation note to include that engine_dir is replaced with checkpoint_dir in examples/llm_ptq and examples/vlm_ptq; other bullets unchanged.
LLM eval docs & script
examples/llm_eval/README.md, examples/llm_eval/run_fastchat.sh
Replaced --engine_dir with --checkpoint_dir and updated headings; run_fastchat.sh also adds optional quant flags (--answer_file, --quant_cfg, --auto_quantize_bits, --calib_size, --calib_batch_size) and forwards them to the Python invocation.
LLM eval Python
examples/llm_eval/gen_model_answer.py, examples/llm_eval/lm_eval_tensorrt_llm.py, examples/llm_eval/mmlu.py
Renamed parameters, CLI options, variables, and call sites from engine_dircheckpoint_dir; updated function signatures and argument parsing; model type inference now uses basename of checkpoint_dir; log message adjusted.
LLM PTQ flow
examples/llm_ptq/run_tensorrt_llm.py, examples/llm_ptq/scripts/huggingface_example.sh
Switched CLI flag and internal references from engine_dircheckpoint_dir; tokenizer fallback now sources files from checkpoint_dir; updated example invocations and minor docstring wording.
Removed helpers
examples/llm_ptq/example_utils.py, examples/vlm_ptq/utils.py
Deleted get_mode_type_from_engine_dir(...) and add_common_args(parser) functions, removing those public utilities/CLI configuration helpers.
VLM PTQ docs
examples/vlm_ptq/README.md
Generalized description to reference “supported Hugging Face multi‑modal models” and removed explicit default parameter details.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hop through dirs where engines slept,
Now checkpoints guide the paths we've kept.
Flags reborn, the scripts align—
Quant carrots glint, a tidy line.
Old helpers burrowed, waved goodbye,
I nibble bugs and watch tests fly. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title correctly highlights the engine_dir to checkpoint_dir refactor but restricts it to PTQ examples even though the pull request also renames parameters and documentation across evaluation scripts and the CHANGELOG, making it only partially descriptive of the full changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chenjiel/remove_engines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cjluo-nv cjluo-nv changed the title Refactor engine_dir to checkpoint_dir in PTQ examples. [NVBUG: 5535437] Refactor engine_dir to checkpoint_dir in PTQ examples. Sep 24, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
examples/llm_eval/README.md (1)

164-167: Restore -e flag in MT-Bench command.

run_fastchat.sh still requires -e/--checkpoint_dir. The updated README example drops that flag, so anyone following the docs will hit the usage error and the workflow stops. Please fix the command.

-bash run_fastchat.sh -h <HF model folder or model card> <Quantized checkpoint dir>
+bash run_fastchat.sh -h <HF model folder or model card> -e <Quantized checkpoint dir>
examples/llm_eval/gen_model_answer.py (3)

139-156: Avoid NameError under Ray: pass args into get_model_answers

get_model_answers references args.* but args isn’t passed; under Ray workers this will raise NameError. Pass args explicitly.

Apply this diff:

         get_answers_func(
             model_path,
             model_id,
             questions[i : i + chunk_size],
             answer_file,
             max_new_token,
             num_choices,
             num_gpus_per_model,
             max_gpu_memory,
             dtype=dtype,
             revision=revision,
             top_p=top_p,
             temperature=temperature,
             checkpoint_dir=checkpoint_dir,
             nim_model=nim_model,
+            args=args,
         )

164-179: Accept args in get_model_answers; stop relying on module-global args

Make args an explicit parameter to be Ray‑safe.

Apply this diff:

 @torch.inference_mode()
 def get_model_answers(
     model_path,
     model_id,
     questions,
     answer_file,
     max_new_token,
     num_choices,
     num_gpus_per_model,
     max_gpu_memory,
     dtype,
     revision,
     top_p=None,
     temperature=None,
-    checkpoint_dir=None,
-    nim_model=None,
+    checkpoint_dir=None,
+    nim_model=None,
+    args=None,
 ):

208-219: Guard optional quantization args to avoid crashes under Ray

Use getattr on args to avoid AttributeError/NameError if args is None or missing fields.

Apply this diff:

-        tokenizer = get_tokenizer(model_path, trust_remote_code=args.trust_remote_code)
-        if args.quant_cfg:
+        trust_remote_code = getattr(args, "trust_remote_code", False)
+        tokenizer = get_tokenizer(model_path, trust_remote_code=trust_remote_code)
+        if getattr(args, "quant_cfg", None):
             quantize_model(
                 model,
-                args.quant_cfg,
+                args.quant_cfg,
                 tokenizer,
-                args.calib_batch_size,
-                args.calib_size,
-                args.auto_quantize_bits,
+                getattr(args, "calib_batch_size", 4),
+                getattr(args, "calib_size", 512),
+                getattr(args, "auto_quantize_bits", None),
                 test_generated=False,
             )
examples/llm_eval/mmlu.py (1)

256-270: Fix Qwen pad/eos detection after checkpoint_dir rename.

When the checkpoint directory comes from scripts/huggingface_example.sh its basename looks like saved_models_<model>_...; your new parsing now sets model_type to "saved", so the Qwen-specific branch never runs. That regresses Qwen runs (pad/eos stay unset and generation breaks). Please restore the previous behavior by stripping the prefix and matching on lower-cased names (or reusing the old helper).

-        last_part = os.path.basename(kwargs["checkpoint_dir"])
-        model_type = last_part.split("_")[0]
+        last_part = os.path.basename(os.path.normpath(kwargs["checkpoint_dir"]))
+        if last_part.startswith("saved_models_"):
+            last_part = last_part[len("saved_models_") :]
+        model_type = last_part.split("_")[0].lower()
         # Some models require to set pad_token and eos_token based on external config (e.g., qwen)
-        if model_type == "qwen":
+        if model_type.startswith("qwen"):
             tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
             tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)
🧹 Nitpick comments (2)
examples/llm_eval/gen_model_answer.py (1)

430-439: CLI polish: mutually exclusive flags and deprecation alias

To avoid user confusion, make --checkpoint-dir and --nim-model mutually exclusive, and optionally accept deprecated --engine-dir as an alias that maps to checkpoint_dir.

Apply this diff:

-    parser.add_argument(
-        "--checkpoint-dir",
-        type=str,
-        help="The path to the model checkpoint directory.",
-    )
-    parser.add_argument(
-        "--nim-model",
-        type=str,
-        help="The NIM model handle to use",
-    )
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument(
+        "--checkpoint-dir",
+        type=str,
+        help="Path to the model checkpoint directory (contains config.json).",
+    )
+    group.add_argument(
+        "--nim-model",
+        type=str,
+        help="The NIM model handle to use.",
+    )
+    # Deprecated alias for backward compatibility
+    parser.add_argument(
+        "--engine-dir",
+        dest="checkpoint_dir",
+        type=str,
+        help="DEPRECATED. Use --checkpoint-dir instead.",
+    )
examples/llm_ptq/run_tensorrt_llm.py (1)

31-31: Require --checkpoint_dir to fail fast on misconfiguration.

Without the flag, we cascade into harder-to-debug None-type errors when attempting to load the checkpoint. Marking it as required keeps the CLI experience sharp.

-    parser.add_argument("--checkpoint_dir", type=str)
+    parser.add_argument("--checkpoint_dir", type=str, required=True)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26c203a and 8845aba.

📒 Files selected for processing (11)
  • CHANGELOG.rst (1 hunks)
  • examples/llm_eval/README.md (3 hunks)
  • examples/llm_eval/gen_model_answer.py (6 hunks)
  • examples/llm_eval/lm_eval_tensorrt_llm.py (2 hunks)
  • examples/llm_eval/mmlu.py (2 hunks)
  • examples/llm_eval/run_fastchat.sh (5 hunks)
  • examples/llm_ptq/example_utils.py (0 hunks)
  • examples/llm_ptq/run_tensorrt_llm.py (4 hunks)
  • examples/llm_ptq/scripts/huggingface_example.sh (4 hunks)
  • examples/vlm_ptq/README.md (1 hunks)
  • examples/vlm_ptq/utils.py (0 hunks)
💤 Files with no reviewable changes (2)
  • examples/vlm_ptq/utils.py
  • examples/llm_ptq/example_utils.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T20:14:34.768Z
Learnt from: realAsma
PR: NVIDIA/TensorRT-Model-Optimizer#318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:191-191
Timestamp: 2025-09-16T20:14:34.768Z
Learning: The TensorRT-Model-Optimizer project only supports PyTorch >= 2.6, so using the `weights_only` parameter in torch.load calls is acceptable and doesn't require backward compatibility handling.

Applied to files:

  • CHANGELOG.rst
🧬 Code graph analysis (3)
examples/llm_ptq/run_tensorrt_llm.py (1)
modelopt/deploy/llm/generate.py (1)
  • LLM (53-291)
examples/llm_eval/gen_model_answer.py (2)
examples/llm_ptq/example_utils.py (1)
  • get_tokenizer (39-61)
modelopt/deploy/llm/generate.py (1)
  • LLM (53-291)
examples/llm_eval/lm_eval_tensorrt_llm.py (1)
modelopt/deploy/llm/generate.py (2)
  • LLM (53-291)
  • max_seq_len (154-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (16)
examples/llm_eval/gen_model_answer.py (3)

121-124: Param rename to checkpoint_dir looks good

Signature change is consistent with the repo-wide refactor.


262-281: Branching on checkpoint_dir for generation is correct

The fallback to model.generate vs. LLM.generate_tokens aligns with the refactor.


505-508: LGTM: call site updated to pass checkpoint_dir

Consistent with the new argument name.

CHANGELOG.rst (1)

11-11: Changelog entry reflects the checkpoint_dir refactor cleanly.

The note aligns with the code updates I’ve seen—thanks for keeping the release notes in sync.

examples/vlm_ptq/README.md (1)

59-59: README wording matches the newly generalized workflow.

Appreciate the neutral phrasing; it now reads correctly for any supported Hugging Face multi-modal checkpoint.

examples/llm_ptq/run_tensorrt_llm.py (2)

52-54: Tokenzier fallback matches the renamed artifact layout.

Good to see the comment updated alongside the directory rename; behavior stays consistent.


69-70: LLM instantiation now targets the checkpoint directory.

This is the key behavioral switch, and it lines up with the upstream LLM API surface.

examples/llm_ptq/scripts/huggingface_example.sh (4)

161-164: Enc-dec guidance mirrors the checkpoint export wording.

Nice touch keeping the messaging clear for users hitting that branch.


190-190: Run step now hands the checkpoint directory to the eval script.

Matches the refactor plan—no issues spotted.


232-232: lm_eval invocation tracks the new flag name.

checkpoint_dir is propagated into the harness exactly where it needs to be.


262-263: MMLU call is consistent with the renamed flag.

Thanks for catching this path as well; keeps post-quant eval frictionless.

examples/llm_eval/lm_eval_tensorrt_llm.py (1)

45-64: Constructor rename lines up with the shared LLM API.

The assertions and log message now reflect the checkpoint-based workflow; looks solid.

examples/llm_eval/run_fastchat.sh (4)

23-35: Usage text now points readers at the checkpoint directory.

The example command updates are clear and match the rest of the refactor.


44-61: CLI parsing adopts checkpoint_dir cleanly.

The rename is threaded through without introducing regressions.


99-124: Optional quant args are folded in neatly.

Good balance of flexibility without breaking the existing call sites.


142-149: Final command forwards the new flag to FastChat.

Output generation should continue to work with the checkpointed artifacts.

Copy link

codecov bot commented Sep 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (26c203a) to head (dcf507e).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #365   +/-   ##
=======================================
  Coverage   73.46%   73.46%           
=======================================
  Files         172      172           
  Lines       17640    17640           
=======================================
  Hits        12959    12959           
  Misses       4681     4681           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
examples/llm_eval/gen_model_answer.py (1)

426-429: CLI rename accepted; consider a hidden alias for backward compatibility

Optional: add a suppressed --engine-dir alias that maps to checkpoint_dir to avoid breaking existing scripts.

 parser.add_argument(
-    "--checkpoint-dir",
+    "--checkpoint-dir",
     type=str,
     help="The path to the model checkpoint directory.",
 )
+parser.add_argument(
+    "--engine-dir",
+    dest="checkpoint_dir",
+    type=str,
+    help=argparse.SUPPRESS,
+)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8845aba and dcf507e.

📒 Files selected for processing (1)
  • examples/llm_eval/gen_model_answer.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_eval/gen_model_answer.py (2)
examples/llm_ptq/example_utils.py (1)
  • get_tokenizer (39-61)
modelopt/deploy/llm/generate.py (1)
  • LLM (53-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (3)
examples/llm_eval/gen_model_answer.py (3)

258-277: Branch on checkpoint_dir for TRT-LLM vs HF: OK

Logic is clear; uses generate_tokens in TRT-LLM path and decodes correctly.


501-504: Main call updated to checkpoint_dir: good

Callsite matches the renamed argument.


177-193: Fix NameError and Ray-safety; improve Qwen detection; pass trust_remote_code to LLM

get_model_answers references args but doesn’t receive it. This will break in Ray workers and is fragile even locally. Also, Qwen detection via split is brittle.

Apply:

 def get_model_answers(
@@
-    top_p=None,
-    temperature=None,
-    checkpoint_dir=None,
-    nim_model=None,
+    top_p=None,
+    temperature=None,
+    checkpoint_dir=None,
+    nim_model=None,
+    trust_remote_code=False,
+    quant_cfg=None,
+    calib_batch_size=4,
+    calib_size=512,
+    auto_quantize_bits=None,
 ):
@@
-    tokenizer = get_tokenizer(model_path, trust_remote_code=args.trust_remote_code)
+    tokenizer = get_tokenizer(model_path, trust_remote_code=trust_remote_code)
     if checkpoint_dir:
-        # get model type
-        last_part = os.path.basename(checkpoint_dir)
-        model_type = last_part.split("_")[0]
-        # Some models require to set pad_token and eos_token based on external config (e.g., qwen)
-        if model_type == "qwen":
+        # Some models require pad/eos tokens (e.g., Qwen)
+        last_part = os.path.basename(checkpoint_dir)
+        if "qwen" in last_part.lower():
             tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
             tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)
 
         assert LLM is not None, "tensorrt_llm APIs could not be imported."
-        model = LLM(checkpoint_dir, tokenizer=tokenizer)
+        model = LLM(checkpoint_dir, tokenizer=tokenizer, trust_remote_code=trust_remote_code)

Also update the quantization block (outside the diff range) to use the newly passed params instead of args:

# replace:
# if args.quant_cfg:
#     quantize_model(model, args.quant_cfg, tokenizer, args.calib_batch_size, args.calib_size, args.auto_quantize_bits, test_generated=False)

if quant_cfg:
    quantize_model(
        model,
        quant_cfg,
        tokenizer,
        calib_batch_size,
        calib_size,
        auto_quantize_bits,
        test_generated=False,
    )

Comment on lines +121 to 124
checkpoint_dir,
nim_model,
args,
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Rename to checkpoint_dir in run_eval: looks good

Parameter rename is consistent.

Run to confirm no stale uses remain:


🏁 Script executed:

#!/bin/bash
# Find lingering engine_dir references (flags, kwargs, vars)
rg -n -C2 -g '!**/venv/**' -g '!**/build/**' -g '!**/dist/**' -e '\bengine_dir\b' -e '--engine-dir'

Length of output: 148


🏁 Script executed:

#!/bin/bash
# Re-run search including all files and excluding venv, build, dist
rg -n -C2 -g '*' -g '!venv/**' -g '!build/**' -g '!dist/**' -e '\bengine_dir\b' -e '--engine-dir'

Length of output: 5278


Stale engine_dir references in Windows accuracy benchmark examples

  • examples/windows/accuracy_benchmark/README.md (lines 165, 177): replace --engine_dir with --checkpoint-dir
  • examples/windows/accuracy_benchmark/mmlu_benchmark.py (line 523): rename CLI arg and variable from engine_dir to checkpoint_dir
  • examples/windows/accuracy_benchmark/trtllm_utils.py (around line 317): update parser.add_argument("--engine_dir",…) and all helper functions to use checkpoint_dir instead
🤖 Prompt for AI Agents
In examples/llm_eval/gen_model_answer.py around lines 121-124, ensure any
remaining references to the old CLI/variable name engine_dir are replaced with
checkpoint_dir; across the Windows accuracy benchmark examples update the README
(lines 165,177) to use --checkpoint-dir, rename CLI args and variables in
examples/windows/accuracy_benchmark/mmlu_benchmark.py (line ~523) from
engine_dir to checkpoint_dir, and in
examples/windows/accuracy_benchmark/trtllm_utils.py update
parser.add_argument("--engine_dir", ...) to --checkpoint-dir and rename all
helper function parameters and internal variables from engine_dir to
checkpoint_dir so names and docs are consistent.

Comment on lines +153 to 155
checkpoint_dir=checkpoint_dir,
nim_model=nim_model,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Pass explicit flags to remote worker; avoid relying on globals inside Ray

When use_ray is true, get_model_answers runs out-of-process. Pass required options explicitly to avoid NameError on globals.

Apply:

             checkpoint_dir=checkpoint_dir,
+            trust_remote_code=args.trust_remote_code,
+            quant_cfg=args.quant_cfg,
+            calib_batch_size=args.calib_batch_size,
+            calib_size=args.calib_size,
+            auto_quantize_bits=args.auto_quantize_bits,
             nim_model=nim_model,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
checkpoint_dir=checkpoint_dir,
nim_model=nim_model,
)
checkpoint_dir=checkpoint_dir,
trust_remote_code=args.trust_remote_code,
quant_cfg=args.quant_cfg,
calib_batch_size=args.calib_batch_size,
calib_size=args.calib_size,
auto_quantize_bits=args.auto_quantize_bits,
nim_model=nim_model,
)
🤖 Prompt for AI Agents
In examples/llm_eval/gen_model_answer.py around lines 153-155, the call that
runs get_model_answers out-of-process with use_ray=True relies on module globals
and will raise NameError inside the remote worker; update the remote invocation
to pass all required options explicitly (e.g., flags/variables previously
referenced as globals such as
model/config/checkpoint_dir/nim_model/tokenizer/device/seed/other runtime
options) and update get_model_answers' remote-compatible signature to accept
those parameters so the worker receives everything it needs; ensure only
serializable types are passed and remove reliance on globals inside the function
body.

@cjluo-nv cjluo-nv merged commit 598b9ce into main Sep 25, 2025
27 checks passed
@cjluo-nv cjluo-nv deleted the chenjiel/remove_engines branch September 25, 2025 19:07
kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025
yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025
…s. (#365)

Signed-off-by: Chenjie Luo <[email protected]>
Signed-off-by: Chenjie Luo <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants