You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG-Windows.rst
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,14 @@
2
2
Model Optimizer Changelog (Windows)
3
3
===================================
4
4
5
+
0.33 (2025-07-21)
6
+
^^^^^^^^^^^^^^^^^
7
+
8
+
**New Features**
9
+
10
+
- TensorRT Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,15 @@ Model Optimizer Changelog (Linux)
8
8
9
9
**Deprecations**
10
10
11
-
- Deprecate ``torch<2.5`` support.
11
+
- Deprecate ``torch<2.6`` support.
12
12
13
13
**New Features**
14
14
15
15
- (Experimental) Add quantization support for custom TensorRT op in ONNX models.
16
16
- Add support for Minifinetuning (MFT; https://arxiv.org/abs/2506.15702) self-corrective distillation, which enables training on small datasets with severely mitigated catastrophic forgetting.
17
17
- Add tree decoding support for Megatron Eagle models.
18
+
- For most VLMs, we now explicitly disable quant on the vision part so we add them to the excluded_modules during HF export.
19
+
- Add support for ``hidden_size`` and ``num_layers`` pruning for Megatron Core Mamba models in ``mcore_gpt_minitron`` mode.
18
20
19
21
0.33 (2025-07-14)
20
22
^^^^^^^^^^^^^^^^^
@@ -36,6 +38,7 @@ Model Optimizer Changelog (Linux)
36
38
- ModelOpt now supports quantization of tensor-parallel sharded Huggingface transformer models. This requires ``transformers>=4.52.0``.
37
39
- Support quantization of FSDP2 wrapped models and add FSDP2 support in the ``llm_qat`` example.
0 commit comments