-
Notifications
You must be signed in to change notification settings - Fork 190
Update onnx ptq test to be single threaded and make it faster #415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ajrasane <[email protected]>
WalkthroughSequentializes the ONNX PTQ example and evaluation; adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Script as tests/examples/test_onnx_ptq.sh
participant Quant as quantize CLI
participant Eval as evaluate CLI
Note over Script,Quant: For each model -> quant_mode runs strictly sequentially
loop For each quant_mode (sequential)
Script->>Quant: run quantize(model, quant_mode)
Quant-->>Script: quant artifacts
end
Note over Script,Eval: Evaluation gated by eval_mode
loop For each quant_mode (sequential, if eval_mode)
Script->>Eval: run evaluate(model, quant_mode, --timing_cache_path)
Eval-->>Script: metrics
end
Script->>Script: aggregate_results.py (only if eval_mode)
sequenceDiagram
autonumber
participant Eval as evaluate.py
participant TRTClient as trt_client.ir_to_compiled
participant Builder as engine_builder.build_engine
participant Trtexec as trtexec (external)
Eval->>TRTClient: ir_to_compiled(..., timing_cache_path=path)
TRTClient->>Builder: build_engine(..., timing_cache_path=path)
Builder->>Trtexec: run trtexec --timing-cache=path
Trtexec-->>Builder: engine artifact
Builder-->>TRTClient: compiled engine
TRTClient-->>Eval: compiled result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/examples/test_onnx_ptq.sh (2)
147-153: Quote paths and make calibration device configurableForeground quantization change looks good. Quote path/vars to avoid word splitting and allow overriding the calibration device without editing the script.
Apply:
- python -m modelopt.onnx.quantization \ - --onnx_path=$model_dir/fp16/model.onnx \ - --quantize_mode=$quant_mode \ - --calibration_data=$calib_data_path \ - --output_path=$model_dir/$quant_mode/model.quant.onnx \ - --calibration_eps=cuda:0 + python -m modelopt.onnx.quantization \ + --onnx_path="$model_dir/fp16/model.onnx" \ + --quantize_mode="$quant_mode" \ + --calibration_data="$calib_data_path" \ + --output_path="$model_dir/$quant_mode/model.quant.onnx" \ + --calibration_eps="${CALIB_DEVICE:-cuda:0}"
32-32: Remove unused variablenvidia_gpu_count is no longer used after making the script single‑threaded. Safe to delete.
-nvidia_gpu_count=$(nvidia-smi --query-gpu=count --format=csv,noheader,nounits | head -n 1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/examples/test_onnx_ptq.sh(3 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/examples/test_onnx_ptq.sh
[error] 181-181: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).
(SC2199)
[warning] 181-181: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
tests/examples/test_onnx_ptq.sh
Outdated
| echo "Starting evaluation of $model_name for mode: $quant_mode" | ||
| if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then | ||
| CUDA_VISIBLE_DEVICES=$gpu_id python evaluate.py \ | ||
| python evaluate.py \ | ||
| --onnx_path=$eval_model_path \ | ||
| --engine_path=$engine_path \ | ||
| --model_name="${timm_model_name[$model_name]}" \ | ||
| --engine_precision=$precision \ | ||
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv & | ||
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv | ||
| else | ||
| CUDA_VISIBLE_DEVICES=$gpu_id python evaluate.py \ | ||
| python evaluate.py \ | ||
| --onnx_path=$eval_model_path \ | ||
| --engine_path=$engine_path \ | ||
| --imagenet_path=$imagenet_path \ | ||
| --eval_data_size=$calib_size \ | ||
| --batch_size $batch_size \ | ||
| --model_name="${timm_model_name[$model_name]}" \ | ||
| --engine_precision=$precision \ | ||
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv & | ||
| fi | ||
| pids+=($!) | ||
| done | ||
|
|
||
| # Wait for all evaluation processes to complete for this model | ||
| error_occurred=false | ||
| for pid in "${pids[@]}"; do | ||
| if ! wait $pid; then | ||
| echo "ERROR: Evaluation process (PID: $pid) failed" | ||
| error_occurred=true | ||
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix array membership test (shellcheck) and quote all args
- Use glob match with ${array[*]} for membership; satisfies SC2199/SC2076 and avoids regex pitfalls.
- Quote all path/vars in evaluate.py calls to prevent word splitting.
Apply:
- echo "Starting evaluation of $model_name for mode: $quant_mode"
- if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then
- python evaluate.py \
- --onnx_path=$eval_model_path \
- --engine_path=$engine_path \
- --model_name="${timm_model_name[$model_name]}" \
- --engine_precision=$precision \
- --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv
+ echo "Starting evaluation of $model_name for mode: $quant_mode"
+ if [[ " ${latency_models[*]} " == *" $model_name "* ]]; then
+ python evaluate.py \
+ --onnx_path="$eval_model_path" \
+ --engine_path="$engine_path" \
+ --model_name="${timm_model_name[$model_name]}" \
+ --engine_precision="$precision" \
+ --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv"
else
- python evaluate.py \
- --onnx_path=$eval_model_path \
- --engine_path=$engine_path \
- --imagenet_path=$imagenet_path \
- --eval_data_size=$calib_size \
- --batch_size $batch_size \
- --model_name="${timm_model_name[$model_name]}" \
- --engine_precision=$precision \
- --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv
+ python evaluate.py \
+ --onnx_path="$eval_model_path" \
+ --engine_path="$engine_path" \
+ --imagenet_path="$imagenet_path" \
+ --eval_data_size="$calib_size" \
+ --batch_size "$batch_size" \
+ --model_name="${timm_model_name[$model_name]}" \
+ --engine_precision="$precision" \
+ --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv"Based on Shellcheck hints.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "Starting evaluation of $model_name for mode: $quant_mode" | |
| if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then | |
| CUDA_VISIBLE_DEVICES=$gpu_id python evaluate.py \ | |
| python evaluate.py \ | |
| --onnx_path=$eval_model_path \ | |
| --engine_path=$engine_path \ | |
| --model_name="${timm_model_name[$model_name]}" \ | |
| --engine_precision=$precision \ | |
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv & | |
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv | |
| else | |
| CUDA_VISIBLE_DEVICES=$gpu_id python evaluate.py \ | |
| python evaluate.py \ | |
| --onnx_path=$eval_model_path \ | |
| --engine_path=$engine_path \ | |
| --imagenet_path=$imagenet_path \ | |
| --eval_data_size=$calib_size \ | |
| --batch_size $batch_size \ | |
| --model_name="${timm_model_name[$model_name]}" \ | |
| --engine_precision=$precision \ | |
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv & | |
| fi | |
| pids+=($!) | |
| done | |
| # Wait for all evaluation processes to complete for this model | |
| error_occurred=false | |
| for pid in "${pids[@]}"; do | |
| if ! wait $pid; then | |
| echo "ERROR: Evaluation process (PID: $pid) failed" | |
| error_occurred=true | |
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv | |
| fi | |
| echo "Starting evaluation of $model_name for mode: $quant_mode" | |
| if [[ " ${latency_models[*]} " == *" $model_name "* ]]; then | |
| python evaluate.py \ | |
| --onnx_path="$eval_model_path" \ | |
| --engine_path="$engine_path" \ | |
| --model_name="${timm_model_name[$model_name]}" \ | |
| --engine_precision="$precision" \ | |
| --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv" | |
| else | |
| python evaluate.py \ | |
| --onnx_path="$eval_model_path" \ | |
| --engine_path="$engine_path" \ | |
| --imagenet_path="$imagenet_path" \ | |
| --eval_data_size="$calib_size" \ | |
| --batch_size "$batch_size" \ | |
| --model_name="${timm_model_name[$model_name]}" \ | |
| --engine_precision="$precision" \ | |
| --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv" | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[error] 181-181: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).
(SC2199)
[warning] 181-181: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
🤖 Prompt for AI Agents
In tests/examples/test_onnx_ptq.sh around lines 180-198, the array membership
test and unquoted arguments are unsafe; replace the current if-condition with a
glob-style check using the array expansion e.g. [[ " ${latency_models[*]} " ==
*" $model_name "* ]] to satisfy ShellCheck (SC2199/SC2076), and quote all
variables passed to python evaluate.py (e.g. "$eval_model_path", "$engine_path",
"$imagenet_path", "$calib_size", "$batch_size",
"${timm_model_name[$model_name]}", "$precision",
"$model_dir/$quant_mode/${model_name}_${quant_mode}.csv") so none are word-split
or subject to globbing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #415 +/- ##
==========================================
- Coverage 73.79% 73.37% -0.42%
==========================================
Files 171 180 +9
Lines 17591 17937 +346
==========================================
+ Hits 12981 13162 +181
- Misses 4610 4775 +165 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| echo "Starting quantization of $model_name for mode: $quant_mode on GPU $gpu_id" | ||
| CUDA_VISIBLE_DEVICES=$gpu_id python -m modelopt.onnx.quantization \ | ||
| echo "Starting quantization of $model_name for mode: $quant_mode" | ||
| python -m modelopt.onnx.quantization \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this support multi-gpu calibration to use all available GPUs instead of cuda:0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should be able to control which GPU is getting used using CUDA_VISIBLE_DEVICES. However for now, I have disabled the GPU parallelism in the test till I figure out the root cause.
Signed-off-by: ajrasane <[email protected]>
67305e4 to
0951c2c
Compare
Signed-off-by: ajrasane <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests passing in 13 mins: https://gitlab-master.nvidia.com/omniml/modelopt/-/jobs/218842251
tests/examples/test_onnx_ptq.sh
Outdated
| --model_name="${timm_model_name[$model_name]}" \ | ||
| --engine_precision=$precision \ | ||
| --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv \ | ||
| --timing_cache_path=build/timing.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this build dir available across multiple CI/CD pipeline? Is there a way to use persistent storage in CI/CD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not. But we can use $model_dir/build which points to /home/scratch.omniml_data_1/models_ci/onnx/build. Test models are also used from /home/scratch.omniml_data_1/models_ci/onnx ($model_dir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be the better thing to do, otherwise using this flag has no benefit for this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevalmorabia97 are you referring to models_folder?
model_dir will be build/model_name. So it wont be correct to save the timing cache in the individual model_dir as the other builds won't be accessing it.
I think the build directory is created under modelopt/examples/onnx_ptq. It is cleared at the end of each test unless we pass the --no-clean option. This is shared by the individual models
@i-riyad , the CICD will run on a different machine every time. Won't it be better to create the timing crash fresh for every CI run? Once it is built for the first model, it will be reused for the successive models in their builds.
Signed-off-by: ajrasane <[email protected]>
ad47d8a to
7be261c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/examples/test_onnx_ptq.sh (2)
192-192: Fix array membership test and use glob-style pattern matching.ShellCheck (SC2199/SC2076) flags the regex-based array concatenation and quoted right-hand side pattern. Use a glob-style match with
${latency_models[*]}for safer, simpler membership testing.Apply this diff:
- if [[ " ${latency_models[@]} " =~ " $model_name " ]]; then + if [[ " ${latency_models[*]} " == *" $model_name "* ]]; then
194-210: Quote all shell variables in python invocations.Unquoted variables in evaluate.py calls risk word-splitting and globbing if paths or values contain spaces. Also, line 206 has a syntax error:
--batch_size $batch_sizeis missing the=separator (should be--batch_size=$batch_sizeor--batch_size "$batch_size").Apply this diff to quote all arguments and fix the batch_size syntax:
python evaluate.py \ - --onnx_path=$eval_model_path \ - --engine_path=$engine_path \ + --onnx_path="$eval_model_path" \ + --engine_path="$engine_path" \ --model_name="${timm_model_name[$model_name]}" \ - --engine_precision=$precision \ - --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv \ - --timing_cache_path=$timing_cache_path + --engine_precision="$precision" \ + --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv" \ + --timing_cache_path="$timing_cache_path" else python evaluate.py \ - --onnx_path=$eval_model_path \ - --engine_path=$engine_path \ - --imagenet_path=$imagenet_path \ - --eval_data_size=$eval_size \ - --batch_size $batch_size \ + --onnx_path="$eval_model_path" \ + --engine_path="$engine_path" \ + --imagenet_path="$imagenet_path" \ + --eval_data_size="$eval_size" \ + --batch_size="$batch_size" \ --model_name="${timm_model_name[$model_name]}" \ - --engine_precision=$precision \ - --results_path=$model_dir/$quant_mode/${model_name}_${quant_mode}.csv \ - --timing_cache_path=$timing_cache_path + --engine_precision="$precision" \ + --results_path="$model_dir/$quant_mode/${model_name}_${quant_mode}.csv" \ + --timing_cache_path="$timing_cache_path"
🧹 Nitpick comments (2)
tests/examples/test_onnx_ptq.sh (2)
157-161: Quote variables in quantization command for consistency and safety.While less critical (paths are locally controlled), unquoted variables in the quantization invocation should be quoted to follow shell best practices and prevent potential issues if paths contain spaces or special characters.
Apply this diff:
python -m modelopt.onnx.quantization \ - --onnx_path=$model_dir/fp16/model.onnx \ + --onnx_path="$model_dir/fp16/model.onnx" \ --quantize_mode=$quant_mode \ - --calibration_data=$calib_data_path \ - --output_path=$model_dir/$quant_mode/model.quant.onnx \ + --calibration_data="$calib_data_path" \ + --output_path="$model_dir/$quant_mode/model.quant.onnx" \ --calibration_eps=cuda
142-142: Quote variables in image_prep.py invocation.For consistency with shell best practices, quote the variables in the image_prep.py call.
Apply this diff:
-python image_prep.py --output_path=$calib_data_path --calibration_data_size=$calib_size +python image_prep.py --output_path="$calib_data_path" --calibration_data_size="$calib_size"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/examples/test_onnx_ptq.sh(4 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/examples/test_onnx_ptq.sh
[error] 192-192: Arrays implicitly concatenate in [[ ]]. Use a loop (or explicit * instead of @).
(SC2199)
[warning] 192-192: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: wait-checks / wait
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (2)
tests/examples/test_onnx_ptq.sh (2)
170-218: Evaluation block structure and timing cache integration are sound.The eval_mode gating and timing_cache_path propagation are correctly implemented. The sequential evaluation design aligns well with the PR objective to make the test single-threaded and efficient.
229-230: Wall-time formatting is correct.The HH:MM:SS formatting with printf is properly implemented and clearly communicates total runtime.
Signed-off-by: ajrasane <[email protected]>
What does this PR do?
Type of change: Bug fix
Overview:
test_onnx_ptq.shto be single threadedTesting
Before your PR is "Ready for review"
Summary by CodeRabbit
Tests
New Features
Bug Fixes / Improvements