You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Product Rename: TensorRT Model Optimizer to Model Optimizer (#583)
- [x] Product Rename: TensorRT Model Optimizer to Model Optimizer
(OMNIML-3033)
- [x] Mention in Latest News section with date on the date of merging
this PR (12/08)
Signed-off-by: Keval Morabia <[email protected]>
Copy file name to clipboardExpand all lines: .github/ISSUE_TEMPLATE/1_bug_report.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ labels: bug
6
6
assignees: ''
7
7
---
8
8
9
-
**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues?q=is%3Aissue).**
9
+
**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/Model-Optimizer/issues?q=is%3Aissue).**
10
10
11
11
## Describe the bug
12
12
<!-- Description of what the bug is, its impact (blocker, should have, nice to have) and any stack traces or error messages. -->
@@ -30,7 +30,7 @@ If you are unsure about whom to tag, you can leave it blank, and we will make su
30
30
31
31
## System information
32
32
33
-
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
33
+
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
34
34
35
35
- Container used (if applicable): ?
36
36
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? <!-- If Windows, please add the `windows` label to the issue. -->
Copy file name to clipboardExpand all lines: .github/ISSUE_TEMPLATE/3_question.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ labels: question
6
6
assignees: ''
7
7
---
8
8
9
-
Make sure you already checked the [examples](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples) and [documentation](https://nvidia.github.io/TensorRT-Model-Optimizer/) before submitting an issue.
9
+
Make sure you already checked the [examples](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples) and [documentation](https://nvidia.github.io/Model-Optimizer/) before submitting an issue.
10
10
11
11
## How would you like to use ModelOpt
12
12
@@ -23,7 +23,7 @@ If you are unsure about whom to tag, you can leave it blank, and we will make su
23
23
24
24
## System information
25
25
26
-
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
26
+
<!-- Run this script to automatically collect system information: https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/ISSUE_TEMPLATE/get_system_info.py -->
27
27
28
28
- Container used (if applicable): ?
29
29
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? <!-- If Windows, please add the `windows` label to the issue. -->
Copy file name to clipboardExpand all lines: .github/PULL_REQUEST_TEMPLATE.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@
17
17
## Before your PR is "*Ready for review*"
18
18
<!-- If you haven't finished some of the above items you can still open `Draft` PR. -->
19
19
20
-
-**Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed.
20
+
-**Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed.
21
21
-**Is this change backward compatible?**: Yes/No <!--- If No, explain why. -->
22
22
-**Did you write any new necessary tests?**: Yes/No
23
23
-**Did you add or update any necessary documentation?**: Yes/No
24
-
-**Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
24
+
-**Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
Copy file name to clipboardExpand all lines: CHANGELOG-Windows.rst
+8-9Lines changed: 8 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,33 @@
1
-
===================================
2
-
Model Optimizer Changelog (Windows)
3
-
===================================
1
+
NVIDIA Model Optimizer Changelog (Windows)
2
+
==========================================
4
3
5
4
0.33 (2025-07-21)
6
5
^^^^^^^^^^^^^^^^^
7
6
8
7
**New Features**
9
8
10
-
- TensorRT Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
9
+
- Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
11
10
12
11
13
12
0.27 (2025-04-30)
14
13
^^^^^^^^^^^^^^^^^
15
14
16
15
**New Features**
17
16
18
-
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/TensorRT-Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models.
19
-
- TensorRT Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models.
17
+
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer `Windows Support Matrix <https://nvidia.github.io/Model-Optimizer/guides/0_support_matrix.html>`_ for details about supported features and models.
18
+
- Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check `example scripts <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq>`_ for getting started with quantizing these models.
20
19
21
20
22
21
0.19 (2024-11-18)
23
22
^^^^^^^^^^^^^^^^^
24
23
25
24
**New Features**
26
25
27
-
- This is the first official release of TensorRT Model Optimizer for Windows
26
+
- This is the first official release of Model Optimizer for Windows
28
27
- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
29
-
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_
28
+
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
30
+
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
32
31
- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus>`_.
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+20-19Lines changed: 20 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
-
Model Optimizer Changelog (Linux)
2
-
=================================
1
+
NVIDIA Model Optimizer Changelog (Linux)
2
+
========================================
3
3
4
-
0.40 (2025-12-11)
4
+
0.40 (2025-12-15)
5
5
^^^^^^^^^^^^^^^^^
6
6
7
7
**Bug Fixes**
@@ -12,21 +12,22 @@ Model Optimizer Changelog (Linux)
12
12
**New Features**
13
13
14
14
- Add MoE (e.g. Qwen3-30B-A3B, gpt-oss-20b) pruning support for ``num_moe_experts``, ``moe_ffn_hidden_size`` and ``moe_shared_expert_intermediate_size`` parameters in Minitron pruning (``mcore_minitron``).
15
-
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
15
+
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
16
16
- Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
17
-
- Add KL Divergence loss based auto_quantize method. See `auto_quantize API docs <https://nvidia.github.io/TensorRT-Model-Optimizer/reference/generated/modelopt.torch.quantization.model_quant.html#modelopt.torch.quantization.model_quant.auto_quantize>`_ for more details.
17
+
- Add KL Divergence loss based auto_quantize method. See `auto_quantize API docs <https://nvidia.github.io/Model-Optimizer/reference/generated/modelopt.torch.quantization.model_quant.html#modelopt.torch.quantization.model_quant.auto_quantize>`_ for more details.
18
18
- Add support for saving and resuming auto_quantize search state. This speeds up the auto_quantize process by skipping the score estimation step if the search state is provided.
19
19
- Add flag ``trt_plugins_precision`` in ONNX autocast to indicate custom ops precision. This is similar to the flag already existing in the quantization workflow.
20
20
- Add support for PyTorch Geometric quantization.
21
21
- Add per tensor and per channel MSE calibrator support.
22
-
- Added support for PTQ/QAT checkpoint export and loading for running fakequant evaluation in vLLM. See ``examples/vllm_serve/README.md#load-qatptq-model-and-serve-in-vllm-wip`` for more details.
22
+
- Added support for PTQ/QAT checkpoint export and loading for running fakequant evaluation in vLLM. See `examples/vllm_serve/README.md<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/vllm_serve#load-qatptq-model-and-serve-in-vllm-wip>`_ for more details.
23
23
24
24
**Documentation**
25
25
26
26
- Deprecate ``examples/megatron-lm`` in favor of more detailed documentation in `Megatron-LM/examples/post_training/modelopt <https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt>`_.
27
27
28
28
**Misc**
29
29
30
+
- NVIDIA TensorRT Model Optimizer is now officially rebranded as NVIDIA Model Optimizer. GitHub will automatically redirect the old repository path (``NVIDIA/TensorRT-Model-Optimizer``) to the new one (``NVIDIA/Model-Optimizer``). Documentation URL is also changed to `nvidia.github.io/Model-Optimizer <https://nvidia.github.io/Model-Optimizer>`_.
30
31
- Bump TensorRT-LLM docker to 1.2.0rc4.
31
32
- Bump minimum recommended transformers version to 4.53.
32
33
- Replace ONNX simplification package from ``onnxsim`` to ``onnxslim``.
@@ -36,7 +37,7 @@ Model Optimizer Changelog (Linux)
36
37
37
38
**Deprecations**
38
39
39
-
- Deprecated ``modelopt.torch._deploy.utils.get_onnx_bytes`` API. Please use ``modelopt.torch._deploy.utils.get_onnx_bytes_and_metadata`` instead to access the ONNX model bytes with external data. see `examples/onnx_ptq/download_example_onnx.py <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/onnx_ptq/download_example_onnx.py>`_ for example usage.
40
+
- Deprecated ``modelopt.torch._deploy.utils.get_onnx_bytes`` API. Please use ``modelopt.torch._deploy.utils.get_onnx_bytes_and_metadata`` instead to access the ONNX model bytes with external data. see `examples/onnx_ptq/download_example_onnx.py <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/onnx_ptq/download_example_onnx.py>`_ for example usage.
40
41
41
42
**New Features**
42
43
@@ -46,7 +47,7 @@ Model Optimizer Changelog (Linux)
46
47
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` (gated dataset accessed using ``HF_TOKEN`` environment variable) if no dataset is specified.
47
48
- Allow specifying ``calib_seq`` in ``examples/llm_ptq`` to set the maximum sequence length for calibration.
48
49
- Add support for MCore MoE PTQ/QAT/QAD.
49
-
- Add support for multi-node PTQ and export with FSDP2 in ``examples/llm_ptq/multinode_ptq.py``. See `examples/llm_ptq/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq#multi-node-post-training-quantization-with-fsdp2>`_ for more details.
50
+
- Add support for multi-node PTQ and export with FSDP2 in ``examples/llm_ptq/multinode_ptq.py``. See `examples/llm_ptq/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#multi-node-post-training-quantization-with-fsdp2>`_ for more details.
50
51
- Add support for Nemotron Nano VL v1 & v2 models in FP8/NVFP4 PTQ workflow.
51
52
- Add flags ``nodes_to_include`` and ``op_types_to_include`` in AutoCast to force-include nodes in low precision, even if they would otherwise be excluded by other rules.
52
53
- Add support for ``torch.compile`` and benchmarking in ``examples/diffusers/quantization/diffusion_trt.py``.
@@ -57,15 +58,15 @@ Model Optimizer Changelog (Linux)
57
58
58
59
**Documentation**
59
60
60
-
- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
61
-
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details
61
+
- Add general guidelines for Minitron pruning and distillation. See `examples/pruning/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/pruning#pruning-guidelines>`_ for more details.
62
+
- Added example for exporting QLoRA checkpoint for vLLM deployment. Refer to `examples/llm_qat/README.md <https://github.com/NVIDIA/Model-Optimizer/blob/79ef31bc7269ba4da0cfab446da5b64509cbfcef/examples/llm_qat/README.md#qlora-deployment>`_ for more details
62
63
63
64
0.37 (2025-10-08)
64
65
^^^^^^^^^^^^^^^^^
65
66
66
67
**Deprecations**
67
68
68
-
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM or TensorRT docker image directly or refer to the `installation guide <https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html>`_ for more details.
69
+
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM or TensorRT docker image directly or refer to the `installation guide <https://nvidia.github.io/Model-Optimizer/getting_started/2_installation.html>`_ for more details.
69
70
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
70
71
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. ``engine_dir`` is replaced with ``checkpoint_dir`` in ``examples/llm_ptq`` and ``examples/vlm_ptq``. For performance evaluation, please use ``trtllm-bench`` directly.
71
72
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
@@ -232,16 +233,16 @@ Model Optimizer Changelog (Linux)
232
233
- Disabled saving modelopt state in unified hf export APIs by default, i.e., added ``save_modelopt_state`` flag in ``export_hf_checkpoint`` API and by default set to False.
233
234
- Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
234
235
- The :class:`modelopt.deploy.llm.LLM` now support use the :class:`tensorrt_llm._torch.LLM` backend for the quantized HuggingFace checkpoints.
235
-
- Add `NVFP4 PTQ example for DeepSeek-R1 <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/deepseek>`_.
236
-
- Add end-to-end `AutoDeploy example for AutoQuant LLM models <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_autodeploy>`_.
236
+
- Add `NVFP4 PTQ example for DeepSeek-R1 <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/deepseek>`_.
237
+
- Add end-to-end `AutoDeploy example for AutoQuant LLM models <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_autodeploy>`_.
237
238
238
239
0.23 (2025-01-29)
239
240
^^^^^^^^^^^^^^^^^
240
241
241
242
**Backward Breaking Changes**
242
243
243
244
- Support TensorRT-LLM to 0.17. Examples (e.g. benchmark task in llm_ptq) may not be fully compatible with TensorRT-LLM 0.15.
244
-
- Nvidia TensorRT Model Optimizer has changed its LICENSE from NVIDIA Proprietary (library wheel) and MIT (examples) to Apache 2.0 in this first full OSS release.
245
+
- Nvidia Model Optimizer has changed its LICENSE from NVIDIA Proprietary (library wheel) and MIT (examples) to Apache 2.0 in this first full OSS release.
245
246
- Deprecate Python 3.8, Torch 2.0, and Cuda 11.x support.
246
247
- ONNX Runtime dependency upgraded to 1.20 which no longer supports Python 3.9.
247
248
- In the Huggingface examples, the ``trust_remote_code`` is by default set to false and require users to explicitly turning it on with ``--trust_remote_code`` flag.
@@ -289,7 +290,7 @@ Model Optimizer Changelog (Linux)
289
290
**Backward Breaking Changes**
290
291
291
292
- Deprecated the summarize task in the ``llm_ptq`` example.
292
-
- Deprecated the ``type`` flag in the `huggingface_example.sh <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq/scripts/huggingface_example.sh>`_
293
+
- Deprecated the ``type`` flag in the `huggingface_example.sh <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq/scripts/huggingface_example.sh>`_
293
294
- Deprecated Python plugin support in ONNX.
294
295
- Support TensorRT-LLM 0.13. Examples not compatible with TensorRT-LLM 0.12.
295
296
- :meth:`mtq.auto_quantize <modelopt.torch.quantization.model_quant.auto_quantize>` API has been updated. The API now
@@ -326,7 +327,7 @@ Model Optimizer Changelog (Linux)
326
327
- New APIs and examples: :mod:`modelopt.torch.prune` for pruning Conv, Linear, and Attention heads for
- New API: :mod:`modelopt.torch.distill` for knowledge distillation, along with guides and example.
329
-
- New Example: `HF BERT Prune, Distill & Quantize <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/chained_optimizations>`_
330
+
- New Example: `HF BERT Prune, Distill & Quantize <https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/chained_optimizations>`_
330
331
showcasing how to chain pruning, distillation, and quantization to achieve the best performance on a given model.
331
332
- Added INT8/FP8 DQ-only support for ONNX model.
332
333
- New API: :mod:`modelopt.torch.speculative` for end-to-end support of Medusa models.
@@ -389,13 +390,13 @@ Model Optimizer Changelog (Linux)
389
390
390
391
**Backward Breaking Changes**
391
392
392
-
- `PTQ examples <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq>`_ have been
393
+
- `PTQ examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq>`_ have been
393
394
upgraded to use TensorRT-LLM 0.10.
394
395
395
396
**New Features**
396
397
397
398
- Adding TensorRT-LLM checkpoint export support for Medusa decoding (official ``MedusaModel`` and Megatron Core ``GPTModel``).
398
-
- Enable support for mixtral, recurrentgemma, starcoder, qwen in `PTQ examples <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/llm_ptq>`_.
399
+
- Enable support for mixtral, recurrentgemma, starcoder, qwen in `PTQ examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq>`_.
399
400
- Adding TensorRT-LLM checkpoint export and engine building support for sparse models.
400
401
- Import scales from TensorRT calibration cache and use them for quantization.
401
402
- (Experimental) Enable low GPU memory FP8 calibration for the Hugging Face models when the original model size does not fit into the GPU memory.
@@ -409,7 +410,7 @@ Model Optimizer Changelog (Linux)
409
410
**Backward Breaking Changes**
410
411
411
412
- [!!!] The package was renamed from ``ammo`` to ``modelopt``. The new full product
412
-
name is *Nvidia TensorRT Model Optimizer*. PLEASE CHANGE ALL YOUR REFERENCES FROM ``ammo`` to
413
+
name is *Nvidia Model Optimizer*. PLEASE CHANGE ALL YOUR REFERENCES FROM ``ammo`` to
413
414
``modelopt`` including any paths and links!
414
415
- Default installation ``pip install nvidia-modelopt`` will now only install minimal core
415
416
dependencies. Following optional dependencies are available depending on the features that are
0 commit comments