Release ModelOpt 0.29.0 Release · NVIDIA/TensorRT-Model-Optimizer

Backward Breaking Changes

Refactor SequentialQuantizer to improve its implementation and maintainability while preserving its functionality.

Deprecations

Deprecate torch<2.4 support.

New Features

Upgrade LLM examples to use TensorRT-LLM 0.18.
Add new model support in the llm_ptq example: Gemma-3, Llama-Nemotron.
Add INT8 real quantization support.
Add an FP8 GEMM per-tensor quantization kernel for real quantization. After PTQ, you can leverage the mtq.compress <modelopt.torch.quantization.compress> API to accelerate evaluation of quantized models.
Use the shape of Pytorch parameters and buffers of TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer> to initialize them during restore. This makes quantized model restoring more robust.
Support adding new custom quantization calibration algorithms. Please refer to mtq.calibrate <modelopt.torch.quantization.model_quant.calibrate> or custom calibration algorithm doc for more details.
Add EAGLE3 (LlamaForCausalLMEagle3) training and unified ModelOpt checkpoint export support for Megatron-LM.
Add support for --override_shapes flag to ONNX quantization.
- --calibration_shapes is reserved for the input shapes used for calibration process.
- --override_shapes is used to override the input shapes of the model with static shapes.
Add support for UNet ONNX quantization.
Enable concat_elimination pass by default to improve the performance of quantized ONNX models.
Enable Redundant Cast elimination pass by default in moq.quantize <modelopt.onnx.quantization.quantize>.
Add new attribute parallel_state to DynamicModule <modelopt.torch.opt.dynamic.DynamicModule> to support distributed parallelism such as data parallel and tensor parallel.
Add MXFP8, NVFP4 quantized ONNX export support.
Add new example for torch quantization to ONNX for MXFP8, NVFP4 precision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ModelOpt 0.29.0 Release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!