·
1 commit
to release/0.37.0
since this release
Deprecations
- Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM, or TensorRT docker image directly or refer to the installation guide for more details.
- Deprecated
quantize_modeargument inexamples/onnx_ptq/evaluate.pyto support strong typing. Useengine_precisioninstead. - Deprecated TRT-LLM's TRT backend in
examples/llm_ptqandexamples/vlm_ptq. Tasksbuildandbenchmarksupport are removed and replaced withquant.engine_diris replaced withcheckpoint_dirinexamples/llm_ptqandexamples/vlm_ptq. For performance evaluation, please usetrtllm-benchdirectly. - The
--export_fmtflag inexamples/llm_ptqis removed. By default, we export to the unified Hugging Face checkpoint format. - Deprecated
examples/vlm_evalas it depends on the deprecated TRT-LLM's TRT backend.
New Features
high_precision_dtypedefaults to fp16 in ONNX quantization, i.e., quantized output model weights are now FP16 by default.- Upgraded TensorRT-LLM dependency to 1.1.0rc2.
- Support for Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in
examples/vlm_ptq. - Support storing and restoring Minitron pruning activations and scores for re-pruning without running the forward loop again.
- Added Minitron pruning example for the Megatron-LM framework. See
examples/megatron-lmfor more details.