Skip to content

Commit 4dbe045

Browse files
committed
Update changelog
Signed-off-by: gcunhase <[email protected]>
1 parent 51845f4 commit 4dbe045

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

CHANGELOG.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,14 @@ Model Optimizer Changelog (Linux)
77
**Bug Fixes**
88

99
- Fix a bug in FastNAS pruning (computer vision models) where the model parameters were sorted twice messing up the ordering.
10+
- Fix Q/DQ/Cast node placements in 'FP32 required' tensors in custom ops in the ONNX quantization workflow.
1011

1112
**New Features**
1213

1314
- Add MoE (e.g. Qwen3-30B-A3B) pruning support for ``num_moe_experts``, ``moe_ffn_hidden_size`` and ``moe_shared_expert_intermediate_size`` parameters in Minitron pruning (``mcore_minitron``).
1415
- Add ``specdec_bench`` example to benchmark speculative decoding performance. See `examples/specdec_bench/README.md <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/specdec_bench#speculative-decoding-benchmark>`_ for more details.
1516
- Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
17+
- Add flag ``trt_plugins_precision`` in ONNX autocast to indicate custom ops precision. This is similar to the flag already existing in the quantization workflow.
1618

1719
0.39 (2025-11-11)
1820
^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)