New Features
- Model Optimizer for Windows now supports NvTensorRtRtx execution-provider.
New Features
- New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer Windows Support Matrix for details about supported features and models.
- Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check example scripts for getting started with quantizing these models.
New Features
- This is the first official release of Model Optimizer for Windows
- ONNX INT4 Quantization: :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
- LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example
- DirectML Deployment Guide: Added DML deployment guide. Refer :ref:`DirectML_Deployment`.
- MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).
- Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.
* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.