-
Notifications
You must be signed in to change notification settings - Fork 170
[NVBUG: 5535437] Refactor engine_dir to checkpoint_dir in PTQ examples. #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chenjie Luo <[email protected]>
WalkthroughRenames TensorRT‑LLM parameter/flag from engine_dir to checkpoint_dir across LLM eval and PTQ examples and scripts, updates docs and logs, adjusts tokenizer fallback to checkpoint_dir, removes two helper functions, expands CHANGELOG deprecation note, generalizes a VLM README paragraph, and adds quant-related flags to run_fastchat.sh. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
examples/llm_eval/README.md (1)
164-167
: Restore-e
flag in MT-Bench command.
run_fastchat.sh
still requires-e/--checkpoint_dir
. The updated README example drops that flag, so anyone following the docs will hit theusage
error and the workflow stops. Please fix the command.-bash run_fastchat.sh -h <HF model folder or model card> <Quantized checkpoint dir> +bash run_fastchat.sh -h <HF model folder or model card> -e <Quantized checkpoint dir>examples/llm_eval/gen_model_answer.py (3)
139-156
: Avoid NameError under Ray: pass args into get_model_answersget_model_answers references args.* but args isn’t passed; under Ray workers this will raise NameError. Pass args explicitly.
Apply this diff:
get_answers_func( model_path, model_id, questions[i : i + chunk_size], answer_file, max_new_token, num_choices, num_gpus_per_model, max_gpu_memory, dtype=dtype, revision=revision, top_p=top_p, temperature=temperature, checkpoint_dir=checkpoint_dir, nim_model=nim_model, + args=args, )
164-179
: Accept args in get_model_answers; stop relying on module-global argsMake args an explicit parameter to be Ray‑safe.
Apply this diff:
@torch.inference_mode() def get_model_answers( model_path, model_id, questions, answer_file, max_new_token, num_choices, num_gpus_per_model, max_gpu_memory, dtype, revision, top_p=None, temperature=None, - checkpoint_dir=None, - nim_model=None, + checkpoint_dir=None, + nim_model=None, + args=None, ):
208-219
: Guard optional quantization args to avoid crashes under RayUse getattr on args to avoid AttributeError/NameError if args is None or missing fields.
Apply this diff:
- tokenizer = get_tokenizer(model_path, trust_remote_code=args.trust_remote_code) - if args.quant_cfg: + trust_remote_code = getattr(args, "trust_remote_code", False) + tokenizer = get_tokenizer(model_path, trust_remote_code=trust_remote_code) + if getattr(args, "quant_cfg", None): quantize_model( model, - args.quant_cfg, + args.quant_cfg, tokenizer, - args.calib_batch_size, - args.calib_size, - args.auto_quantize_bits, + getattr(args, "calib_batch_size", 4), + getattr(args, "calib_size", 512), + getattr(args, "auto_quantize_bits", None), test_generated=False, )examples/llm_eval/mmlu.py (1)
256-270
: Fix Qwen pad/eos detection after checkpoint_dir rename.When the checkpoint directory comes from
scripts/huggingface_example.sh
its basename looks likesaved_models_<model>_...
; your new parsing now setsmodel_type
to"saved"
, so the Qwen-specific branch never runs. That regresses Qwen runs (pad/eos stay unset and generation breaks). Please restore the previous behavior by stripping the prefix and matching on lower-cased names (or reusing the old helper).- last_part = os.path.basename(kwargs["checkpoint_dir"]) - model_type = last_part.split("_")[0] + last_part = os.path.basename(os.path.normpath(kwargs["checkpoint_dir"])) + if last_part.startswith("saved_models_"): + last_part = last_part[len("saved_models_") :] + model_type = last_part.split("_")[0].lower() # Some models require to set pad_token and eos_token based on external config (e.g., qwen) - if model_type == "qwen": + if model_type.startswith("qwen"): tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643) tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)
🧹 Nitpick comments (2)
examples/llm_eval/gen_model_answer.py (1)
430-439
: CLI polish: mutually exclusive flags and deprecation aliasTo avoid user confusion, make --checkpoint-dir and --nim-model mutually exclusive, and optionally accept deprecated --engine-dir as an alias that maps to checkpoint_dir.
Apply this diff:
- parser.add_argument( - "--checkpoint-dir", - type=str, - help="The path to the model checkpoint directory.", - ) - parser.add_argument( - "--nim-model", - type=str, - help="The NIM model handle to use", - ) + group = parser.add_mutually_exclusive_group() + group.add_argument( + "--checkpoint-dir", + type=str, + help="Path to the model checkpoint directory (contains config.json).", + ) + group.add_argument( + "--nim-model", + type=str, + help="The NIM model handle to use.", + ) + # Deprecated alias for backward compatibility + parser.add_argument( + "--engine-dir", + dest="checkpoint_dir", + type=str, + help="DEPRECATED. Use --checkpoint-dir instead.", + )examples/llm_ptq/run_tensorrt_llm.py (1)
31-31
: Require--checkpoint_dir
to fail fast on misconfiguration.Without the flag, we cascade into harder-to-debug None-type errors when attempting to load the checkpoint. Marking it as required keeps the CLI experience sharp.
- parser.add_argument("--checkpoint_dir", type=str) + parser.add_argument("--checkpoint_dir", type=str, required=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CHANGELOG.rst
(1 hunks)examples/llm_eval/README.md
(3 hunks)examples/llm_eval/gen_model_answer.py
(6 hunks)examples/llm_eval/lm_eval_tensorrt_llm.py
(2 hunks)examples/llm_eval/mmlu.py
(2 hunks)examples/llm_eval/run_fastchat.sh
(5 hunks)examples/llm_ptq/example_utils.py
(0 hunks)examples/llm_ptq/run_tensorrt_llm.py
(4 hunks)examples/llm_ptq/scripts/huggingface_example.sh
(4 hunks)examples/vlm_ptq/README.md
(1 hunks)examples/vlm_ptq/utils.py
(0 hunks)
💤 Files with no reviewable changes (2)
- examples/vlm_ptq/utils.py
- examples/llm_ptq/example_utils.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T20:14:34.768Z
Learnt from: realAsma
PR: NVIDIA/TensorRT-Model-Optimizer#318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:191-191
Timestamp: 2025-09-16T20:14:34.768Z
Learning: The TensorRT-Model-Optimizer project only supports PyTorch >= 2.6, so using the `weights_only` parameter in torch.load calls is acceptable and doesn't require backward compatibility handling.
Applied to files:
CHANGELOG.rst
🧬 Code graph analysis (3)
examples/llm_ptq/run_tensorrt_llm.py (1)
modelopt/deploy/llm/generate.py (1)
LLM
(53-291)
examples/llm_eval/gen_model_answer.py (2)
examples/llm_ptq/example_utils.py (1)
get_tokenizer
(39-61)modelopt/deploy/llm/generate.py (1)
LLM
(53-291)
examples/llm_eval/lm_eval_tensorrt_llm.py (1)
modelopt/deploy/llm/generate.py (2)
LLM
(53-291)max_seq_len
(154-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (16)
examples/llm_eval/gen_model_answer.py (3)
121-124
: Param rename to checkpoint_dir looks goodSignature change is consistent with the repo-wide refactor.
262-281
: Branching on checkpoint_dir for generation is correctThe fallback to model.generate vs. LLM.generate_tokens aligns with the refactor.
505-508
: LGTM: call site updated to pass checkpoint_dirConsistent with the new argument name.
CHANGELOG.rst (1)
11-11
: Changelog entry reflects the checkpoint_dir refactor cleanly.The note aligns with the code updates I’ve seen—thanks for keeping the release notes in sync.
examples/vlm_ptq/README.md (1)
59-59
: README wording matches the newly generalized workflow.Appreciate the neutral phrasing; it now reads correctly for any supported Hugging Face multi-modal checkpoint.
examples/llm_ptq/run_tensorrt_llm.py (2)
52-54
: Tokenzier fallback matches the renamed artifact layout.Good to see the comment updated alongside the directory rename; behavior stays consistent.
69-70
: LLM instantiation now targets the checkpoint directory.This is the key behavioral switch, and it lines up with the upstream
LLM
API surface.examples/llm_ptq/scripts/huggingface_example.sh (4)
161-164
: Enc-dec guidance mirrors the checkpoint export wording.Nice touch keeping the messaging clear for users hitting that branch.
190-190
: Run step now hands the checkpoint directory to the eval script.Matches the refactor plan—no issues spotted.
232-232
: lm_eval invocation tracks the new flag name.
checkpoint_dir
is propagated into the harness exactly where it needs to be.
262-263
: MMLU call is consistent with the renamed flag.Thanks for catching this path as well; keeps post-quant eval frictionless.
examples/llm_eval/lm_eval_tensorrt_llm.py (1)
45-64
: Constructor rename lines up with the shared LLM API.The assertions and log message now reflect the checkpoint-based workflow; looks solid.
examples/llm_eval/run_fastchat.sh (4)
23-35
: Usage text now points readers at the checkpoint directory.The example command updates are clear and match the rest of the refactor.
44-61
: CLI parsing adoptscheckpoint_dir
cleanly.The rename is threaded through without introducing regressions.
99-124
: Optional quant args are folded in neatly.Good balance of flexibility without breaking the existing call sites.
142-149
: Final command forwards the new flag to FastChat.Output generation should continue to work with the checkpointed artifacts.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #365 +/- ##
=======================================
Coverage 73.46% 73.46%
=======================================
Files 172 172
Lines 17640 17640
=======================================
Hits 12959 12959
Misses 4681 4681 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Chenjie Luo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
examples/llm_eval/gen_model_answer.py (1)
426-429
: CLI rename accepted; consider a hidden alias for backward compatibilityOptional: add a suppressed --engine-dir alias that maps to checkpoint_dir to avoid breaking existing scripts.
parser.add_argument( - "--checkpoint-dir", + "--checkpoint-dir", type=str, help="The path to the model checkpoint directory.", ) +parser.add_argument( + "--engine-dir", + dest="checkpoint_dir", + type=str, + help=argparse.SUPPRESS, +)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/llm_eval/gen_model_answer.py
(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_eval/gen_model_answer.py (2)
examples/llm_ptq/example_utils.py (1)
get_tokenizer
(39-61)modelopt/deploy/llm/generate.py (1)
LLM
(53-291)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (3)
examples/llm_eval/gen_model_answer.py (3)
258-277
: Branch on checkpoint_dir for TRT-LLM vs HF: OKLogic is clear; uses generate_tokens in TRT-LLM path and decodes correctly.
501-504
: Main call updated to checkpoint_dir: goodCallsite matches the renamed argument.
177-193
: Fix NameError and Ray-safety; improve Qwen detection; pass trust_remote_code to LLMget_model_answers references args but doesn’t receive it. This will break in Ray workers and is fragile even locally. Also, Qwen detection via split is brittle.
Apply:
def get_model_answers( @@ - top_p=None, - temperature=None, - checkpoint_dir=None, - nim_model=None, + top_p=None, + temperature=None, + checkpoint_dir=None, + nim_model=None, + trust_remote_code=False, + quant_cfg=None, + calib_batch_size=4, + calib_size=512, + auto_quantize_bits=None, ): @@ - tokenizer = get_tokenizer(model_path, trust_remote_code=args.trust_remote_code) + tokenizer = get_tokenizer(model_path, trust_remote_code=trust_remote_code) if checkpoint_dir: - # get model type - last_part = os.path.basename(checkpoint_dir) - model_type = last_part.split("_")[0] - # Some models require to set pad_token and eos_token based on external config (e.g., qwen) - if model_type == "qwen": + # Some models require pad/eos tokens (e.g., Qwen) + last_part = os.path.basename(checkpoint_dir) + if "qwen" in last_part.lower(): tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643) tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643) assert LLM is not None, "tensorrt_llm APIs could not be imported." - model = LLM(checkpoint_dir, tokenizer=tokenizer) + model = LLM(checkpoint_dir, tokenizer=tokenizer, trust_remote_code=trust_remote_code)Also update the quantization block (outside the diff range) to use the newly passed params instead of args:
# replace: # if args.quant_cfg: # quantize_model(model, args.quant_cfg, tokenizer, args.calib_batch_size, args.calib_size, args.auto_quantize_bits, test_generated=False) if quant_cfg: quantize_model( model, quant_cfg, tokenizer, calib_batch_size, calib_size, auto_quantize_bits, test_generated=False, )
checkpoint_dir, | ||
nim_model, | ||
args, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Rename to checkpoint_dir in run_eval: looks good
Parameter rename is consistent.
Run to confirm no stale uses remain:
🏁 Script executed:
#!/bin/bash
# Find lingering engine_dir references (flags, kwargs, vars)
rg -n -C2 -g '!**/venv/**' -g '!**/build/**' -g '!**/dist/**' -e '\bengine_dir\b' -e '--engine-dir'
Length of output: 148
🏁 Script executed:
#!/bin/bash
# Re-run search including all files and excluding venv, build, dist
rg -n -C2 -g '*' -g '!venv/**' -g '!build/**' -g '!dist/**' -e '\bengine_dir\b' -e '--engine-dir'
Length of output: 5278
Stale engine_dir references in Windows accuracy benchmark examples
- examples/windows/accuracy_benchmark/README.md (lines 165, 177): replace
--engine_dir
with--checkpoint-dir
- examples/windows/accuracy_benchmark/mmlu_benchmark.py (line 523): rename CLI arg and variable from
engine_dir
tocheckpoint_dir
- examples/windows/accuracy_benchmark/trtllm_utils.py (around line 317): update
parser.add_argument("--engine_dir",…)
and all helper functions to usecheckpoint_dir
instead
🤖 Prompt for AI Agents
In examples/llm_eval/gen_model_answer.py around lines 121-124, ensure any
remaining references to the old CLI/variable name engine_dir are replaced with
checkpoint_dir; across the Windows accuracy benchmark examples update the README
(lines 165,177) to use --checkpoint-dir, rename CLI args and variables in
examples/windows/accuracy_benchmark/mmlu_benchmark.py (line ~523) from
engine_dir to checkpoint_dir, and in
examples/windows/accuracy_benchmark/trtllm_utils.py update
parser.add_argument("--engine_dir", ...) to --checkpoint-dir and rename all
helper function parameters and internal variables from engine_dir to
checkpoint_dir so names and docs are consistent.
checkpoint_dir=checkpoint_dir, | ||
nim_model=nim_model, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Pass explicit flags to remote worker; avoid relying on globals inside Ray
When use_ray is true, get_model_answers runs out-of-process. Pass required options explicitly to avoid NameError on globals.
Apply:
checkpoint_dir=checkpoint_dir,
+ trust_remote_code=args.trust_remote_code,
+ quant_cfg=args.quant_cfg,
+ calib_batch_size=args.calib_batch_size,
+ calib_size=args.calib_size,
+ auto_quantize_bits=args.auto_quantize_bits,
nim_model=nim_model,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
checkpoint_dir=checkpoint_dir, | |
nim_model=nim_model, | |
) | |
checkpoint_dir=checkpoint_dir, | |
trust_remote_code=args.trust_remote_code, | |
quant_cfg=args.quant_cfg, | |
calib_batch_size=args.calib_batch_size, | |
calib_size=args.calib_size, | |
auto_quantize_bits=args.auto_quantize_bits, | |
nim_model=nim_model, | |
) |
🤖 Prompt for AI Agents
In examples/llm_eval/gen_model_answer.py around lines 153-155, the call that
runs get_model_answers out-of-process with use_ray=True relies on module globals
and will raise NameError inside the remote worker; update the remote invocation
to pass all required options explicitly (e.g., flags/variables previously
referenced as globals such as
model/config/checkpoint_dir/nim_model/tokenizer/device/seed/other runtime
options) and update get_model_answers' remote-compatible signature to accept
those parameters so the worker receives everything it needs; ensure only
serializable types are passed and remove reliance on globals inside the function
body.
…s. (#365) Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Chenjie Luo <[email protected]>
…s. (#365) Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>
What does this PR do?
Type of change: Refactor
Overview: ?
We no longer uses TRT-LLM engines in our examples. All references of the keyword
engine
are change tocheckpoint
Summary by CodeRabbit
Documentation
Chores
Refactor