You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/PULL_REQUEST_TEMPLATE.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,6 @@
14
14
## Testing
15
15
<!-- Mention how have you tested your change if applicable. -->
16
16
17
-
18
17
## Before your PR is "*Ready for review*"
19
18
<!-- If you haven't finished some of the above items you can still open `Draft` PR. -->
20
19
@@ -24,6 +23,5 @@
24
23
-**Did you add or update any necessary documentation?**: Yes/No
25
24
-**Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. -->
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Model Optimizer Changelog (Linux)
18
18
- Add support for QAT with HuggingFace + DeepSpeed. See ``examples/gpt_oss`` for an example.
19
19
- Add support for QAT with LoRA. The LoRA adapters can be folded into the base model after QAT and deployed just like a regular PTQ model. See ``examples/gpt_oss`` for an example.
20
20
- ModelOpt provides convenient trainers such as :class:`QATTrainer`, :class:`QADTrainer`, :class:`KDTrainer`, :class:`QATSFTTrainer` which inherits from Huggingface trainers.
21
-
ModelOpt trainers can be used as drop in replacement of the correspoding Huggingface trainer. See usage examples in ``examples/gpt_oss``, ``examples/llm_qat`` or ``examples/llm_distill``.
21
+
ModelOpt trainers can be used as drop in replacement of the corresponding Huggingface trainer. See usage examples in ``examples/gpt_oss``, ``examples/llm_qat`` or ``examples/llm_distill``.
22
22
- (Experimental) Add quantization support for custom TensorRT op in ONNX models.
23
23
- Add support for Minifinetuning (MFT; https://arxiv.org/abs/2506.15702) self-corrective distillation, which enables training on small datasets with severely mitigated catastrophic forgetting.
24
24
- Add tree decoding support for Megatron Eagle models.
@@ -55,8 +55,8 @@ Model Optimizer Changelog (Linux)
55
55
56
56
- NeMo and Megatron-LM distributed checkpoint (``torch-dist``) stored with legacy version can no longer be loaded. The remedy is to load the legacy distributed checkpoint with 0.29 and store a ``torch`` checkpoint and resume with 0.31 to convert to a new format. The following changes only apply to storing and resuming distributed checkpoint.
57
57
- ``quantizer_state`` of :class:`TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer>` is now stored in ``extra_state`` of :class:`QuantModule <modelopt.torch.quantization.nn.module.QuantModule>` where it used to be stored in the sharded ``modelopt_state``.
58
-
- The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now retored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint.
59
-
- Togather with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format.
58
+
- The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now restored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint.
59
+
- Together with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format.
60
60
- auto_quantize API now accepts a list of quantization config dicts as the list of quantization choices.
61
61
- This API previously accepts a list of strings of quantization format names. It was therefore limited to only pre-defined quantization formats unless through some hacks.
62
62
- With this change, now user can easily use their own custom quantization formats for auto_quantize.
@@ -146,7 +146,7 @@ Model Optimizer Changelog (Linux)
146
146
**New Features**
147
147
148
148
- Support fast hadamard transform in :class:`TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer>`.
149
-
It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package `fast_hadamard_transfrom<https://github.com/Dao-AILab/fast-hadamard-transform>`_ to use this feature.
149
+
It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package `fast_hadamard_transform<https://github.com/Dao-AILab/fast-hadamard-transform>`_ to use this feature.
150
150
- Add affine quantization support for the KV cache, resolving the low accuracy issue in models such as Qwen2.5 and Phi-3/3.5.
151
151
- Add FSDP2 support. FSDP2 can now be used for QAT.
152
152
- Add `LiveCodeBench <https://livecodebench.github.io/>`_ and `Simple Evals <https://github.com/openai/simple-evals>`_ to the ``llm_eval`` examples.
0 commit comments