Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
info.yml	info.yml
microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json	microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json
requirements-nvmo.txt	requirements-nvmo.txt

Name

Last commit message

Last commit date

microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json

Phi-4-mini-instruct optimization

This folder contains examples of Olive recipes for Phi-4-mini-instruct optimization.

NVMO PTQ Mixed Precision Quantization

The olive recipe microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json produces INT4 + INT8 mixed precision quantized model using NVIDIA's TensorRT Model Optimizer toolkit with AWQ algorithm.

Setup

Install Olive with NVIDIA TensorRT Model Optimizer toolkit
- Run following command to install Olive with TensorRT Model Optimizer.
```
pip install olive-ai[nvmo]
```
- If TensorRT Model Optimizer needs to be installed from a local wheel, then follow below steps.
```
pip install olive-ai
pip install <modelopt-wheel>[onnx]
```
- Make sure that TensorRT Model Optimizer is installed correctly.
```
python -c "from modelopt.onnx.quantization.int4 import quantize as quantize_int4"
```
- Refer TensorRT Model Optimizer documentation for its detailed installation instructions and setup dependencies.
Install suitable onnxruntime and onnxruntime-genai packages
- Install the onnxruntime and onnxruntime-genai packages that have NvTensorRTRTXExecutionProvider support. Refer documentation for NvTensorRtRtx execution-provider to setup its dependencies/requirements.
- Note that by default, TensorRT Model Optimizer comes with onnxruntime-directml. And onnxrutime-genai-cuda package comes with onnxruntime-gpu. So, in order to use onnxruntime package with NvTensorRTRTXExecutionProvider support, one might need to uninstall existing other onnxruntime packages.
- Make sure that at the end, there is only one onnxruntime package installed. Use command like following for validating the onnxruntime package installation.
```
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
```
Install additional requirements.
- Install packages provided in requirements text file.
```
pip install -r requirements-nvmo.txt
```

Steps to run

olive run --config microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json

Recipe details

The olive recipe microsoft-Phi-4-mini-instruct_nvmo_ptq_mixed_precision_awq_lite.json has 2 passes: (a) ModelBuilder and (b) NVModelOptQuantization. The ModelBuilder pass is used to generate the FP16 model for NvTensorRTRTXExecutionProvider (aka NvTensorRtRtx EP). Subsequently, the NVModelOptQuantization pass performs INT4 + INT8 mixed precision quantization using AWQ algorithm with AWQ Lite calibration method to produce the optimized model.

Troubleshoot

In case of any issue related to quantization using TensorRT Model Optimizer toolkit, refer its FAQs for potential help or suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Phi-4-mini-instruct optimization

NVMO PTQ Mixed Precision Quantization

Setup

Steps to run

Recipe details

Troubleshoot

FilesExpand file tree

NvTensorRtRtx

Directory actions

More options

Directory actions

More options

Latest commit

History

NvTensorRtRtx

Folders and files

parent directory

README.md

Phi-4-mini-instruct optimization

NVMO PTQ Mixed Precision Quantization

Setup

Steps to run

Recipe details

Troubleshoot