Skip to content

ModelOpt 0.37.0 Release

Latest

Choose a tag to compare

@kevalmorabia97 kevalmorabia97 released this 08 Oct 16:43
· 1 commit to release/0.37.0 since this release
df0882a

Deprecations

  • Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM, or TensorRT docker image directly or refer to the installation guide for more details.
  • Deprecated quantize_mode argument in examples/onnx_ptq/evaluate.py to support strong typing. Use engine_precision instead.
  • Deprecated TRT-LLM's TRT backend in examples/llm_ptq and examples/vlm_ptq. Tasks build and benchmark support are removed and replaced with quant. engine_dir is replaced with checkpoint_dir in examples/llm_ptq and examples/vlm_ptq. For performance evaluation, please use trtllm-bench directly.
  • The --export_fmt flag in examples/llm_ptq is removed. By default, we export to the unified Hugging Face checkpoint format.
  • Deprecated examples/vlm_eval as it depends on the deprecated TRT-LLM's TRT backend.

New Features

  • high_precision_dtype defaults to fp16 in ONNX quantization, i.e., quantized output model weights are now FP16 by default.
  • Upgraded TensorRT-LLM dependency to 1.1.0rc2.
  • Support for Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in examples/vlm_ptq.
  • Support storing and restoring Minitron pruning activations and scores for re-pruning without running the forward loop again.
  • Added Minitron pruning example for the Megatron-LM framework. See examples/megatron-lm for more details.