Skip to content

Commit 3a9a3dc

Browse files
committed
Merge branch 'main' of github.com:NVIDIA/TensorRT-Model-Optimizer into chenjiel/deprecate_trtllm_build_2
Signed-off-by: Chenjie Luo <[email protected]>
2 parents 62f10a0 + 8d0e40f commit 3a9a3dc

File tree

69 files changed

+2312
-642
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+2312
-642
lines changed

.github/CODEOWNERS

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -32,24 +32,24 @@ modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners
3232
# Examples
3333
/docker @NVIDIA/modelopt-docker-codeowners
3434
/README.md @NVIDIA/modelopt-examples-codeowners
35-
examples @NVIDIA/modelopt-examples-codeowners
36-
examples/chained_optimizations @NVIDIA/modelopt-torch-nas-prune-codeowners
37-
examples/cnn_qat @NVIDIA/modelopt-examples-cnn_qat-codeowners
38-
examples/deepseek @NVIDIA/modelopt-deploy-codeowners
39-
examples/diffusers @NVIDIA/modelopt-examples-diffusers-codeowners
40-
examples/gpt-oss @NVIDIA/modelopt-examples-gpt-oss-codeowners
41-
examples/llm_autodeploy @NVIDIA/modelopt-deploy-codeowners
42-
examples/llm_distill @NVIDIA/modelopt-torch-distill-codeowners
43-
examples/llm_eval @NVIDIA/modelopt-examples-llm_ptq-codeowners
44-
examples/llm_ptq @NVIDIA/modelopt-examples-llm_ptq-codeowners
45-
examples/llm_qat @NVIDIA/modelopt-examples-llm_qat-codeowners
46-
examples/llm_sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
47-
examples/megatron-lm @NVIDIA/modelopt-examples-megatron-codeowners
48-
examples/model_hub @NVIDIA/modelopt-examples-model_hub-codeowners
49-
examples/nemo_run @NVIDIA/modelopt-examples-megatron-codeowners
50-
examples/onnx_ptq @NVIDIA/modelopt-onnx-codeowners
51-
examples/pruning @NVIDIA/modelopt-torch-nas-prune-codeowners
52-
examples/speculative_decoding @NVIDIA/modelopt-torch-speculative-codeowners
53-
examples/vlm_eval @NVIDIA/modelopt-examples-vlm-codeowners
54-
examples/vlm_ptq @NVIDIA/modelopt-examples-vlm-codeowners
55-
examples/windows @NVIDIA/modelopt-windows-codeowners
35+
/examples @NVIDIA/modelopt-examples-codeowners
36+
/examples/chained_optimizations @NVIDIA/modelopt-torch-nas-prune-codeowners
37+
/examples/cnn_qat @NVIDIA/modelopt-examples-cnn_qat-codeowners
38+
/examples/deepseek @NVIDIA/modelopt-deploy-codeowners
39+
/examples/diffusers @NVIDIA/modelopt-examples-diffusers-codeowners
40+
/examples/gpt-oss @NVIDIA/modelopt-examples-gpt-oss-codeowners
41+
/examples/llm_autodeploy @NVIDIA/modelopt-deploy-codeowners
42+
/examples/llm_distill @NVIDIA/modelopt-torch-distill-codeowners
43+
/examples/llm_eval @NVIDIA/modelopt-examples-llm_ptq-codeowners
44+
/examples/llm_ptq @NVIDIA/modelopt-examples-llm_ptq-codeowners
45+
/examples/llm_qat @NVIDIA/modelopt-examples-llm_qat-codeowners
46+
/examples/llm_sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
47+
/examples/megatron-lm @NVIDIA/modelopt-examples-megatron-codeowners
48+
/examples/model_hub @NVIDIA/modelopt-examples-model_hub-codeowners
49+
/examples/nemo_run @NVIDIA/modelopt-examples-megatron-codeowners
50+
/examples/onnx_ptq @NVIDIA/modelopt-onnx-codeowners
51+
/examples/pruning @NVIDIA/modelopt-torch-nas-prune-codeowners
52+
/examples/speculative_decoding @NVIDIA/modelopt-torch-speculative-codeowners
53+
/examples/vlm_eval @NVIDIA/modelopt-examples-vlm-codeowners
54+
/examples/vlm_ptq @NVIDIA/modelopt-examples-vlm-codeowners
55+
/examples/windows @NVIDIA/modelopt-windows-codeowners

.github/workflows/unit_tests.yml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,6 @@ name: Unit tests
44
on:
55
pull_request:
66
branches: [main, release/*]
7-
paths:
8-
- ".github/workflows/unit_tests.yml"
9-
- "modelopt/**"
10-
- "tests/unit/**"
11-
- "pyproject.toml"
12-
- "setup.py"
13-
- "tox.ini"
147
push:
158
branches: [main, release/*]
169
paths:

CHANGELOG.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Model Optimizer Changelog (Linux)
55
^^^^^^^^^^^^^^^^^
66

77
**Deprecations**
8+
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
89

910
- TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
1011
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
@@ -13,6 +14,7 @@ Model Optimizer Changelog (Linux)
1314
**Bug Fixes**
1415

1516
**New Features**
17+
- ``high_precision_dtype`` default to fp16 in ONNX quantization, i.e. quantized output model weights are now FP16 by default.
1618

1719
0.35 (2025-09-04)
1820
^^^^^^^^^^^^^^^^^
@@ -25,6 +27,7 @@ Model Optimizer Changelog (Linux)
2527
**Bug Fixes**
2628

2729
- Fix attention head ranking logic for pruning Megatron Core GPT models.
30+
- Upgrade TensorRT-LLM dependency to 1.1.0rc2.
2831

2932
**New Features**
3033

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc6
1+
FROM nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
22

33
ARG PIP_EXTRA_INDEX_URL="https://pypi.nvidia.com"
44
ENV PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL \

examples/diffusers/quantization/diffusion_trt.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,11 @@ def main():
105105

106106
image_name = args.save_image_as if args.save_image_as else f"{args.model}.png"
107107

108-
pipe = PipelineManager.create_pipeline_from(MODEL_ID[args.model], dtype_map[args.model_dtype])
108+
pipe = PipelineManager.create_pipeline_from(
109+
MODEL_ID[args.model],
110+
dtype_map[args.model_dtype],
111+
override_model_path=args.override_model_path,
112+
)
109113

110114
# Save the backbone of the pipeline and move it to the GPU
111115
add_embedding = None

examples/diffusers/quantization/quantize.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,9 @@ def __init__(self, config: ModelConfig, logger: logging.Logger):
309309

310310
@staticmethod
311311
def create_pipeline_from(
312-
model_type: ModelType, torch_dtype: torch.dtype = torch.bfloat16
312+
model_type: ModelType,
313+
torch_dtype: torch.dtype = torch.bfloat16,
314+
override_model_path: str | None = None,
313315
) -> DiffusionPipeline:
314316
"""
315317
Create and return an appropriate pipeline based on configuration.
@@ -321,7 +323,9 @@ def create_pipeline_from(
321323
ValueError: If model type is unsupported
322324
"""
323325
try:
324-
model_id = MODEL_REGISTRY[model_type]
326+
model_id = (
327+
MODEL_REGISTRY[model_type] if override_model_path is None else override_model_path
328+
)
325329
if model_type == ModelType.SD3_MEDIUM:
326330
pipe = StableDiffusion3Pipeline.from_pretrained(model_id, torch_dtype=torch_dtype)
327331
elif model_type in [ModelType.FLUX_DEV, ModelType.FLUX_SCHNELL]:

examples/diffusers/quantization/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
cuda-python
2+
diffusers<=0.34.0
23
nvtx
34
onnx_graphsurgeon
45
opencv-python>=4.8.1.78,<4.12.0.88

examples/llm_ptq/hf_ptq.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -722,7 +722,7 @@ def output_decode(generated_ids, input_shape):
722722
)
723723
parser.add_argument(
724724
"--verbose",
725-
help="Print verbose output (e.g. quantization summary). Disable by --no_verbose.",
725+
help="Print verbose output (e.g. quantization summary). Disable by --no-verbose.",
726726
default=True,
727727
action=argparse.BooleanOptionalAction,
728728
)

0 commit comments

Comments
 (0)