Skip to content

Conversation

ajrasane
Copy link
Contributor

@ajrasane ajrasane commented Sep 10, 2025

What does this PR do?

Type of change: Bug fix

Overview:

  • Add option to specify engine path to evaluation API, DeviceModel and TRT Client
  • Update onnx_ptq test with engine paths to avoid engine file IO error.

Testing

  • test_onnx_ptq passes
bash test_onnx_ptq.sh <path_to_imagenet> <path_to_onnx_models_folder>

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • New Features

    • Evaluation CLI now requires an --engine_path to locate/read/write TensorRT engines; engine output can still default to prior locations when not supplied.
    • Engine build supports directing the produced engine to a user-specified path and forwards that path through the runtime.
  • Tests

    • Example test script enhanced with flexible paths, dynamic quant modes, parallel quantization/evaluation, per-mode engine selection, aggregation, and optional cleanup.
  • Documentation

    • CLI help updated to document the engine path option.

Copy link

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adds a required --engine_path CLI option to the ONNX PTQ evaluator, threads engine_path through the TRT client into the TensorRT engine builder which now accepts an optional engine_path for the initial engine output, and updates tests to pass per-mode engine paths and adjust quant/eval flows.

Changes

Cohort / File(s) Summary
CLI: ONNX PTQ evaluate
examples/onnx_ptq/evaluate.py
Adds required --engine_path CLI arg, builds compilation_args = {"engine_path": ...} and calls ir_to_compiled(onnx_bytes, compilation_args).
TensorRT runtime plumbing
modelopt/torch/_deploy/_runtime/trt_client.py, modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py
TRT client forwards engine_path from compilation_args to build_engine; build_engine signature adds `engine_path: Path
Tests: example script
tests/examples/test_onnx_ptq.sh
Refactors test script to support per-mode builds/evaluation, adds --no-clean handling, calibration and GPU capability logic, writes quantized models per mode, sets per-mode engine_path and eval_model_path, invokes evaluate.py with --engine_path=$engine_path, aggregates results, and optionally cleans build artifacts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant EvalScript as evaluate.py
  participant TRTClient as TRTLocalClient
  participant EngineBuilder as TensorRT Engine Builder

  User->>EvalScript: Run with --engine_path=/path/to/model.engine
  EvalScript->>TRTClient: ir_to_compiled(onnx_bytes, {engine_path})
  TRTClient->>EngineBuilder: build_engine(onnx_bytes, engine_path=...)
  rect rgba(200, 230, 255, 0.25)
    note right of EngineBuilder: Resolve initial engine write location\n(use provided engine_path or default)
    EngineBuilder-->>EngineBuilder: Build engine (TRT)
    EngineBuilder-->>EngineBuilder: Read, hash, move to final_output_dir
  end
  EngineBuilder-->>TRTClient: (engine_bytes | None, logs)
  TRTClient-->>EvalScript: compiled_engine / error
  EvalScript-->>User: Evaluation results / logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibbled paths through bytes so clean,
A carrot-marked engine, crisp and lean.
I pass the map where engines sleep,
Hashes snug, their secrets keep.
Hop, compile—then tests all gleam. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Specify engine path for TRT evaluation" is concise and accurately captures the primary change in the PR — adding an engine_path option and plumbing it through the ONNX PTQ evaluation CLI, TRT client, and engine builder so TensorRT engine files can be written to a specified location, which matches the raw_summary and PR objectives.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajrasane/onnx_test_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py (1)

234-241: Keep user-provided engine_path intact; copy hashed artifact into output_dir instead of moving it

Moving a user-specified engine file surprises callers and can break downstream code expecting that exact path to keep existing. Preserve the file at engine_path and write the hashed copy into final_output_dir. Also capture whether engine_path was provided before it’s shadowed.

@@
-    with TemporaryDirectory() as tmp_dir:
-        onnx_path, engine_path, calib_cache_path, timing_cache_path, final_output_dir = (
-            _setup_files_and_paths(Path(tmp_dir), engine_path)
-        )
+    with TemporaryDirectory() as tmp_dir:
+        user_provided_engine_path = engine_path is not None
+        onnx_path, engine_path, calib_cache_path, timing_cache_path, final_output_dir = (
+            _setup_files_and_paths(Path(tmp_dir), engine_path)
+        )
@@
-            shutil.move(engine_path, engine_path_with_hash)
+            # If caller provided a custom engine_path, keep it intact and also persist a hashed copy.
+            if user_provided_engine_path and engine_path.resolve() != engine_path_with_hash.resolve():
+                shutil.copy2(engine_path, engine_path_with_hash)
+            else:
+                shutil.move(engine_path, engine_path_with_hash)
             logging.info("Engine saved to %s", engine_path_with_hash)

Also applies to: 223-227

modelopt/torch/_deploy/_runtime/trt_client.py (1)

86-95: Guard compilation_args=None and fail fast on build failure

current code will AttributeError if compilation_args is None; also callers get None on failures despite return type bytes.

         self.node_names = get_node_names_from_bytes(onnx_model_file_bytes)
-        engine_bytes, _ = build_engine(
+        compilation_args = compilation_args or {}
+        engine_bytes, _ = build_engine(
             onnx_bytes,
             dynamic_shapes=compilation_args.get("dynamic_shapes"),  # type: ignore[union-attr]
             plugin_config=compilation_args.get("plugin_config"),  # type: ignore[union-attr]
             engine_path=compilation_args.get("engine_path"),  # type: ignore[union-attr]
             trt_mode=self.deployment["precision"],
             verbose=(self.deployment.get("verbose", "false").lower() == "true"),
         )
-        self.engine_bytes = engine_bytes
+        if engine_bytes is None:
+            raise RuntimeError("TensorRT engine build failed; see trtexec logs for details.")
+        self.engine_bytes = engine_bytes
         return engine_bytes
🧹 Nitpick comments (3)
modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py (1)

137-147: Clarify engine_path semantics in the docstring

Doc says “Path to save the TensorRT engine,” but the code (and suggested change) preserves that path and also writes a hashed copy into output_dir. Make this explicit.

-        engine_path: Path to save the TensorRT engine.
+        engine_path: Optional path where trtexec will initially write the engine (via --saveEngine).
+            The final, hashed artifact is stored under `output_dir` (or the default artifact dir)
+            as `<hash>-{model_name}.engine`. When a custom `engine_path` is provided, that file
+            is preserved and a hashed copy is saved in the artifact dir.
modelopt/torch/_deploy/_runtime/trt_client.py (1)

77-79: Doc: reference builder’s engine_path behavior

Make it clear this is forwarded to the builder and see builder docs for final file placement.

examples/onnx_ptq/evaluate.py (1)

38-43: Clarify that --engine_path is an output path used during build

Avoid confusion with a “load existing engine” path.

-    parser.add_argument(
+    parser.add_argument(
         "--engine_path",
         type=str,
         required=True,
-        help="Path to the TensorRT engine",
+        help="Output path where trtexec should write the engine (will also be copied to the artifact dir)",
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 669ae05 and 37c6250.

📒 Files selected for processing (4)
  • examples/onnx_ptq/evaluate.py (2 hunks)
  • modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py (3 hunks)
  • modelopt/torch/_deploy/_runtime/trt_client.py (2 hunks)
  • tests/examples/test_onnx_ptq.sh (2 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
tests/examples/test_onnx_ptq.sh

[error] 177-177: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).

(SC2199)


[warning] 177-177: Remove quotes from right-hand side of =~ to match as a regex rather than literally.

(SC2076)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (3)
examples/onnx_ptq/evaluate.py (1)

92-93: Confirmed ir_to_compiled supports a second argument—no changes needed.

tests/examples/test_onnx_ptq.sh (2)

132-134: LGTM: calibration_eps placement

Passing --calibration_eps=cuda:0 works with CUDA_VISIBLE_DEVICES scoping.


165-174: Engine/ONNX path selection per mode looks consistent

Mode-specific engine_path and eval_model_path wiring aligns with the new CLI.

Also applies to: 180-188

Comment on lines 165 to 200
if [ "$quant_mode" == "fp16" ]; then
eval_model_path=$model_dir/fp16/model.onnx
engine_path=$model_dir/fp16/model.engine
elif [ "$quant_mode" == "int8_iq" ]; then
eval_model_path=$model_dir/fp16/model.onnx
engine_path=$model_dir/int8_iq/model.engine
else
eval_model_path=$model_dir/$quant_mode/model.quant.onnx
engine_path=$model_dir/$quant_mode/model.quant.engine
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix membership test to satisfy ShellCheck (SC2199, SC2076) and be robust

Use glob matching with ${array[*]}.

-        if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then
+        # Using glob match on the space-joined array avoids SC2199/SC2076 pitfalls
+        if [[ " ${latency_models[*]} " == *" $model_name "* ]]; then

Also applies to: 177-177

🤖 Prompt for AI Agents
In tests/examples/test_onnx_ptq.sh around lines 165-174 (and also line 177), the
script uses direct string equality checks which trigger ShellCheck
SC2199/SC2076; replace those with a proper membership test using glob matching
on the joined array (e.g., [[ " ${array[*]} " == *" $quant_mode "* ]]). Define
the arrays of allowed modes where appropriate, then change the if/elif
conditions to use the [[ " ${array[*]} " == *" $quant_mode "* ]] pattern so
membership checks are robust and ShellCheck-compliant, keeping the same branches
for fp16, int8_iq and the fallback.

Copy link

codecov bot commented Sep 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (bbb2304) to head (959a173).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #313   +/-   ##
=======================================
  Coverage   73.82%   73.82%           
=======================================
  Files         172      172           
  Lines       17438    17439    +1     
=======================================
+ Hits        12874    12875    +1     
  Misses       4564     4564           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@i-riyad i-riyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ajrasane ajrasane enabled auto-merge (squash) September 18, 2025 13:24
@ajrasane ajrasane disabled auto-merge September 18, 2025 14:03
@ajrasane ajrasane force-pushed the ajrasane/onnx_test_fix branch from e98c325 to 959a173 Compare September 18, 2025 14:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/examples/test_onnx_ptq.sh (1)

32-34: Guard against missing/invalid GPU info (prevents divide-by-zero and numeric test errors).

Handle empty/N/A outputs and zero GPUs before using modulo or numeric comparisons.

Apply this diff:

-nvidia_gpu_count=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | head -n 1)
-cuda_capability=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader | head -n 1 | tr -d .)
+nvidia_gpu_count=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits 2>/dev/null | head -n 1)
+cuda_capability_raw=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -n 1)
+if [[ -z "$nvidia_gpu_count" || "$nvidia_gpu_count" -eq 0 ]]; then
+  echo "ERROR: No NVIDIA GPUs detected (nvidia-smi count: ${nvidia_gpu_count:-unset})." >&2
+  exit 1
+fi
+if [[ -z "$cuda_capability_raw" || "$cuda_capability_raw" == "N/A" ]]; then
+  cuda_capability=0
+else
+  cuda_capability=${cuda_capability_raw/./}  # e.g., 8.9 -> 89
+fi
♻️ Duplicate comments (1)
tests/examples/test_onnx_ptq.sh (1)

203-214: Fix array membership test (ShellCheck SC2199/SC2076).

Use glob match on the space-joined array; current regex-with-quotes is brittle and flagged by ShellCheck.

Apply this diff:

-        if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then
+        # Use glob match on joined array to test membership (SC2199/SC2076)
+        if [[ " ${latency_models[*]} " == *" $model_name "* ]]; then
             CUDA_VISIBLE_DEVICES=$gpu_id python evaluate.py \
                 --onnx_path=$eval_model_path \
-                --engine_path=$engine_path \
+                --engine_path=$engine_path \
                 --model_name="${timm_model_name[$model_name]}" \
                 --engine_precision=$precision \
                 --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv &
🧹 Nitpick comments (7)
tests/examples/test_onnx_ptq.sh (7)

38-58: Harden CLI argument parsing (shift inside for-loop is ineffective).

Switch to a while/case parser so shifts take effect and unknown flags are surfaced.

Apply this diff:

-# Parse arguments
-clean_mode=true
-imagenet_path=""
-models_folder=""
-
-for arg in "$@"; do
-    case $arg in
-        --no-clean)
-            clean_mode=false
-            shift
-            ;;
-        *)
-            if [ -z "$imagenet_path" ]; then
-                imagenet_path="$arg"
-            elif [ -z "$models_folder" ]; then
-                models_folder="$arg"
-            fi
-            shift
-            ;;
-    esac
-done
+# Parse arguments
+clean_mode=true
+imagenet_path=""
+models_folder=""
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --no-clean)
+            clean_mode=false
+            shift
+            ;;
+        -*)
+            echo "Unknown option: $1" >&2
+            exit 2
+            ;;
+        *)
+            if [[ -z "$imagenet_path" ]]; then
+                imagenet_path="$1"
+            elif [[ -z "$models_folder" ]]; then
+                models_folder="$1"
+            else
+                echo "Unexpected positional arg: $1" >&2
+                exit 2
+            fi
+            shift
+            ;;
+    esac
+done

81-86: Make model discovery robust when no .onnx files exist.

Enable nullglob to avoid iterating a literal pattern and fail fast with a clear message.

Apply this diff:

-declare -a model_paths
-for model in "$models_folder"/*.onnx; do
-    model_paths+=("$model")
-done
+shopt -s nullglob
+declare -a model_paths=("$models_folder"/*.onnx)
+shopt -u nullglob
+if [[ ${#model_paths[@]} -eq 0 ]]; then
+    echo "ERROR: No .onnx models found in: $models_folder" >&2
+    exit 1
+fi

64-66: Validate input directories early with helpful errors.

Catch misconfigured imagenet/models paths before doing work.

Apply this diff:

 imagenet_path=${imagenet_path:-/data/imagenet/}
 models_folder=${models_folder:-/models/onnx}
+if [[ ! -d "$imagenet_path" ]]; then
+  echo "ERROR: imagenet_path not found: $imagenet_path" >&2
+  exit 1
+fi
+if [[ ! -d "$models_folder" ]]; then
+  echo "ERROR: models_folder not found: $models_folder" >&2
+  exit 1
+fi

36-36: Quote paths and add a trap to always unwind directory stack.

Avoid word-splitting and ensure popd even on early exits.

Apply this diff:

-pushd $public_example_dir
+pushd "$public_example_dir"
+# Always return to original dir, even on errors
+trap 'popd >/dev/null' EXIT

Optionally remove the explicit popd calls at Lines 246 and 250 since the trap handles it.


117-127: Quote path expansions to handle spaces safely.

Low-risk now, but quoting prevents subtle bugs.

Apply this diff:

-    model_dir=build/$model_name
+    model_dir="build/$model_name"
@@
-        mkdir -p $model_dir/$mode
+        mkdir -p "$model_dir/$mode"
@@
-    cp $model_path $model_dir/fp16/model.onnx
+    cp "$model_path" "$model_dir/fp16/model.onnx"

151-157: Quote CLI args and ensure consistent style.

Keep style consistent (either use --arg=value everywhere or split args uniformly) and quote paths.

Apply this diff:

-        CUDA_VISIBLE_DEVICES=$gpu_id python -m modelopt.onnx.quantization \
-            --onnx_path=$model_dir/fp16/model.onnx \
-            --quantize_mode=$quant_mode \
-            --calibration_data=$calib_data_path \
-            --output_path=$model_dir/$quant_mode/model.quant.onnx \
-            --calibration_eps=cuda:0 &
+        CUDA_VISIBLE_DEVICES=$gpu_id python -m modelopt.onnx.quantization \
+            --onnx_path="$model_dir/fp16/model.onnx" \
+            --quantize_mode="$quant_mode" \
+            --calibration_data="$calib_data_path" \
+            --output_path="$model_dir/$quant_mode/model.quant.onnx" \
+            --calibration_eps="cuda:0" &

Apply similar quoting for evaluate.py invocations.


240-248: Optional: gate cleanup symmetry via flag only (keep initial artifacts when troubleshooting).

Current flow always rm -rf build at start; --no-clean only preserves end artifacts. If you'd like to keep prior runs for diffing, consider skipping the initial rm when --no-clean is set.

Example tweak:

-rm -rf build
-mkdir -p build
+if [[ "$clean_mode" == true ]]; then
+  rm -rf build
+fi
+mkdir -p build
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e98c325 and 959a173.

📒 Files selected for processing (4)
  • examples/onnx_ptq/evaluate.py (2 hunks)
  • modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py (3 hunks)
  • modelopt/torch/_deploy/_runtime/trt_client.py (2 hunks)
  • tests/examples/test_onnx_ptq.sh (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • modelopt/torch/_deploy/_runtime/trt_client.py
  • modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py
  • examples/onnx_ptq/evaluate.py
🧰 Additional context used
🪛 Shellcheck (0.10.0)
tests/examples/test_onnx_ptq.sh

[error] 203-203: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).

(SC2199)


[warning] 203-203: Remove quotes from right-hand side of =~ to match as a regex rather than literally.

(SC2076)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
tests/examples/test_onnx_ptq.sh (1)

188-200: Confirmed — evaluate.py accepts the engine_precision values used by the test.

examples/onnx_ptq/evaluate.py defines --engine_precision as type=str (default "stronglyTyped") and sets deployment["precision"] = args.engine_precision, so "fp16", "best" (int8_iq), and "stronglyTyped" are accepted as-is.

@ajrasane ajrasane merged commit 27a6821 into main Sep 18, 2025
22 checks passed
@ajrasane ajrasane deleted the ajrasane/onnx_test_fix branch September 18, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants