Skip to content

ModelOpt 0.33.0 Release

Choose a tag to compare

@kevalmorabia97 kevalmorabia97 released this 14 Jul 18:18
· 1 commit to release/0.33.0 since this release

Backward Breaking Changes

  • PyTorch dependencies for modelopt.torch features are no longer optional and pip install nvidia-modelopt is now same as pip install nvidia-modelopt[torch].

New Features

  • Upgrade TensorRT-LLM dependency to 0.20.
  • Add new CNN QAT example to demonstrate how to use ModelOpt for QAT.
  • Add support for ONNX models with custom TensorRT ops in Autocast.
  • Add quantization aware distillation (QAD) support in llm_qat example.
  • Add support for BF16 in ONNX quantization.
  • Add per node calibration support in ONNX quantization.
  • ModelOpt now supports quantization of tensor-parallel sharded Huggingface transformer models. This requires transformers>=4.52.0.
  • Support quantization of FSDP2 wrapped models and add FSDP2 support in the llm_qat example.
  • Add NeMo 2 Simplified Flow examples for quantization aware training/distillation (QAT/QAD), speculative decoding, pruning & distillation.