You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,19 @@
1
1
Model Optimizer Changelog (Linux)
2
2
=================================
3
3
4
+
0.35 (2025-08-xx)
5
+
^^^^^^^^^^^^^^^^^
6
+
7
+
**Backward Breaking Changes**
8
+
9
+
**Deprecations**
10
+
11
+
**New Features**
12
+
13
+
- (Experimental) Add quantization support for custom TensorRT op in ONNX models.
14
+
- Add support for Minifinetuning (MFT; https://arxiv.org/abs/2506.15702) self-corrective distillation, which enables training on small datasets with severely mitigated catastrophic forgetting.
15
+
- Add tree decoding support for Megatron Eagle models.
16
+
4
17
0.33 (2025-07-14)
5
18
^^^^^^^^^^^^^^^^^
6
19
@@ -20,7 +33,7 @@ Model Optimizer Changelog (Linux)
20
33
- Add per node calibration support in ONNX quantization.
21
34
- ModelOpt now supports quantization of tensor-parallel sharded Huggingface transformer models. This requires ``transformers>=4.52.0``.
22
35
- Support quantization of FSDP2 wrapped models and add FSDP2 support in the ``llm_qat`` example.
Copy file name to clipboardExpand all lines: docs/source/getting_started/windows/_installation_with_olive.rst
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,8 +24,9 @@ Setup Steps for Olive with ModelOpt-Windows
24
24
$ pip install onnxruntime-genai-directml>=0.4.0
25
25
$ pip install onnxruntime-directml==1.20.0
26
26
27
+
- Above onnxruntime and onnxruntime-genai packages enable Olive workflow with DirectML Execution-Provider (EP). To use other EPs, install corresponding packages.
27
28
28
-
Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
29
+
- Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
29
30
30
31
**2. Configure Olive for TensorRT Model Optimizer – Windows**
31
32
@@ -36,7 +37,11 @@ Setup Steps for Olive with ModelOpt-Windows
36
37
37
38
- **Add Other Passes:** Add additional passes to the Olive configuration file as needed for the desired Olive workflow of your input model. [Refer `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example]
38
39
39
-
**4. Run the Optimization**
40
+
**4. Install other dependencies**
41
+
42
+
- Install other requirements as needed by the Olive scripts and config.
43
+
44
+
**5. Run the Optimization**
40
45
41
46
- **Execute Optimization:** To start the optimization process, run the following commands:
42
47
@@ -56,4 +61,5 @@ Setup Steps for Olive with ModelOpt-Windows
56
61
57
62
**Note**:
58
63
59
-
#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based models in the Olive workflow.
64
+
#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
65
+
#. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example.
Copy file name to clipboardExpand all lines: docs/source/guides/4_distillation.rst
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,6 +62,10 @@ Example usage:
62
62
meta model. Thus, the same callable must be available in the namespace when restoring via
63
63
the :meth:`mto.restore <modelopt.torch.opt.conversion.restore>` utility.
64
64
65
+
.. tip::
66
+
When training the student on a small corpus of ground truth data, consider using :class:`MFTLoss <modelopt.torch.distill.MFTLoss>` for to perform Minifinetuning in lieu of the standard
67
+
:class:`LogitsDistillationLoss <modelopt.torch.distill.losses.LogitsDistillationLoss>`. This will allow the student to learn from the teacher's distribution while adapting to the new data, improving the specialization of the new data without overwriting teacher's general knowledge.
68
+
65
69
.. note::
66
70
As the model is not of the same class anymore, calling ``type()`` on the model after conversion
67
71
will not work as expected.
@@ -124,6 +128,9 @@ maps or logits) which the teacher has already mastered. This can serve multiple
124
128
**C.** Module replacement: One can replace a single module within a model with a more efficient one
125
129
and use distillation on its original outputs to effectively re-integrate it into the whole model.
126
130
131
+
**D.** Minimal modification without catastrophic forgetting: A variant of distillation, called Minifinetuning,
132
+
allows for training a model on even small datasets without losing the original model's knowledge.
133
+
127
134
Student
128
135
^^^^^^^
129
136
@@ -192,3 +199,15 @@ ground truth labels may be.
192
199
193
200
194
201
.. _1: https://arxiv.org/abs/1803.03635
202
+
203
+
Minifinetuning
204
+
^^^^^^^^^^^^^^
205
+
206
+
Minifinetuning is a technique that allows for training a model on even small datasets without losing the original
207
+
model's knowledge. This is achieved by algorithmic modification of the teacher's distribution depending on its
208
+
performance on the new dataset. The goal is to ensure that the separation between the correct and incorrect argmax
209
+
tokens is large enough, which can be controlled by a threshold parameter. ModelOpt provides a pre-defined loss function
210
+
for this purpose, called :class:`MFTDistillationLoss <modelopt.torch.distill.losses.MFTDistillationLoss>`, which can
211
+
be used in place of the standard :class:`LogitsDistillationLoss <modelopt.torch.distill.losses.LogitsDistillationLoss>`.
212
+
More information about the technique can be found in the original paper:
213
+
`Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation <https://arxiv.org/abs/2506.15702>`_.
0 commit comments