ModelOpt 0.35.0 Release
·
3 commits
to release/0.35.0
since this release
Deprecations
- Deprecate
torch<2.6support. - Deprecate NeMo 1.0 model support.
Bug Fixes
- Fix attention head ranking logic for pruning Megatron Core GPT models.
New Features
- ModelOpt now supports PTQ and QAT for GPT-OSS models. See
examples/gpt_ossfor end-to-end PTQ/QAT example. - Add support for QAT with HuggingFace + DeepSpeed. See
examples/gpt_ossfor an example. - Add support for QAT with LoRA. The LoRA adapters can be folded into the base model after QAT and deployed just like a regular PTQ model. See
examples/gpt_ossfor an example. - ModelOpt provides convenient trainers such as :class:
QATTrainer, :class:QADTrainer, :class:KDTrainer, :class:QATSFTTrainerwhich inherits from Huggingface trainers.
ModelOpt trainers can be used as drop in replacement of the corresponding Huggingface trainer. See usage examples inexamples/gpt_oss,examples/llm_qatorexamples/llm_distill. - (Experimental) Add quantization support for custom TensorRT op in ONNX models.
- Add support for Minifinetuning (MFT; https://arxiv.org/abs/2506.15702) self-corrective distillation, which enables training on small datasets with severely mitigated catastrophic forgetting.
- Add tree decoding support for Megatron Eagle models.
- For most VLMs, we now explicitly disable quant on the vision part so we add them to the excluded_modules during HF export.
- Add support for
mamba_num_heads,mamba_head_dim,hidden_sizeandnum_layerspruning for Megatron Core Mamba or Hybrid Transformer Mamba models inmcore_minitron(previouslymcore_gpt_minitron) mode. - Add example for QAT/QAD training with
LLaMA Factory <https://github.com/hiyouga/LLaMA-Factory/tree/main>_. Seeexamples/llm_qat/llama_factoryfor more details. - Upgrade TensorRT-LLM dependency to 1.0.0rc6.
- Add unified HuggingFace model export support for quantized NVFP4 GPT-OSS models.