ModelOpt 0.33.0 Release
·
1 commit
to release/0.33.0
since this release
Backward Breaking Changes
- PyTorch dependencies for
modelopt.torchfeatures are no longer optional andpip install nvidia-modeloptis now same aspip install nvidia-modelopt[torch].
New Features
- Upgrade TensorRT-LLM dependency to 0.20.
- Add new CNN QAT example to demonstrate how to use ModelOpt for QAT.
- Add support for ONNX models with custom TensorRT ops in Autocast.
- Add quantization aware distillation (QAD) support in
llm_qatexample. - Add support for BF16 in ONNX quantization.
- Add per node calibration support in ONNX quantization.
- ModelOpt now supports quantization of tensor-parallel sharded Huggingface transformer models. This requires
transformers>=4.52.0. - Support quantization of FSDP2 wrapped models and add FSDP2 support in the
llm_qatexample. - Add NeMo 2 Simplified Flow examples for quantization aware training/distillation (QAT/QAD), speculative decoding, pruning & distillation.