Skip to content

ModelOpt 0.29.0 Release

Choose a tag to compare

@kevalmorabia97 kevalmorabia97 released this 09 May 05:26
· 221 commits to main since this release

Backward Breaking Changes

  • Refactor SequentialQuantizer to improve its implementation and maintainability while preserving its functionality.

Deprecations

  • Deprecate torch<2.4 support.

New Features

  • Upgrade LLM examples to use TensorRT-LLM 0.18.
  • Add new model support in the llm_ptq example: Gemma-3, Llama-Nemotron.
  • Add INT8 real quantization support.
  • Add an FP8 GEMM per-tensor quantization kernel for real quantization. After PTQ, you can leverage the mtq.compress <modelopt.torch.quantization.compress> API to accelerate evaluation of quantized models.
  • Use the shape of Pytorch parameters and buffers of TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer> to initialize them during restore. This makes quantized model restoring more robust.
  • Support adding new custom quantization calibration algorithms. Please refer to mtq.calibrate <modelopt.torch.quantization.model_quant.calibrate> or custom calibration algorithm doc for more details.
  • Add EAGLE3 (LlamaForCausalLMEagle3) training and unified ModelOpt checkpoint export support for Megatron-LM.
  • Add support for --override_shapes flag to ONNX quantization.
    • --calibration_shapes is reserved for the input shapes used for calibration process.
    • --override_shapes is used to override the input shapes of the model with static shapes.
  • Add support for UNet ONNX quantization.
  • Enable concat_elimination pass by default to improve the performance of quantized ONNX models.
  • Enable Redundant Cast elimination pass by default in moq.quantize <modelopt.onnx.quantization.quantize>.
  • Add new attribute parallel_state to DynamicModule <modelopt.torch.opt.dynamic.DynamicModule> to support distributed parallelism such as data parallel and tensor parallel.
  • Add MXFP8, NVFP4 quantized ONNX export support.
  • Add new example for torch quantization to ONNX for MXFP8, NVFP4 precision.