This repository demonstrates the optimization of the Microsoft Phi-3.5 Mini Instruct model using post-training quantization (PTQ) techniques.
Quantization is resource-intensive and requires GPU acceleration. In an x64 Python environment, install the required packages:
pip install -r requirements.txt
# Disable CUDA extension build
# Linux
export BUILD_CUDA_EXT=0
# Windows
# set BUILD_CUDA_EXT=0
# Install GptqModel from source
pip install --no-build-isolation git+https://github.com/CodeLinaro/GPTQModel.git@rel_4.2.5Model compilation using QNN Execution Provider requires a Python environment with onnxruntime-qnn installed. In a separate Python environment, install the required packages:
# Install Olive
pip install olive-ai==0.11.0
# Install ONNX Runtime QNN
pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt
pip install --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple "onnxruntime-qnn==1.23.2" --no-depsReplace /path/to/qnn/env/bin in config.json with the path to the directory containing your QNN environment's Python executable. This path can be found by running the following command in the environment:
# Linux
command -v python
# Windows
# where pythonThis command will return the path to the Python executable. Set the parent directory of the executable as the /path/to/qnn/env/bin in the config file.
Activate the Quantization Python Environment and run the workflow:
olive run --config x2_elite_config.jsonOlive will run the AOT compilation step in the AOT Compilation Python Environment specified in the config file using a subprocess. All other steps will run in the Quantization Python Environment natively.
✅ Optimized model saved in: models/phi3_5-qnn/
⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step.
Running QNN-GPU configs requires features and fixes that are not available in the released Olive version 0.9.3. To ensure compatibility, please install Olive directly from the source at the required commit:
pip install git+https://github.com/microsoft/Olive.git@da24463e14ed976503dc5871572b285bc5ddc4b2If you previously installed Olive via PyPI or pinned it to version 0.9.3, please uninstall it first and then use the above commit to install:
pip uninstall olive-aiReplace /path/to/qnn/env/bin in config_gpu.json with the path to the directory containing your QNN environment's Python executable.
Activate the Quantization Python Environment and run the workflow:
olive run --config config_gpu.json✅ Optimized model saved in: models/phi3.5-mini-instruct/
Replace /path/to/model/ in config_gpu_ctxbin.json with the output path generated from above step.
Activate the AOT Python Environment and run the workflow:
olive run --config config_gpu_ctxbin.json✅ Optimized model saved in: models/phi3.5-mini-instruct/
⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step.