This folder contains a sample use case of Olive to optimize a Phi-4-mini-instruct model using OpenVINO tools.
- Intel® GPU: Phi 4 Mini Instruct Dynamic Shape Model
- Intel® NPU: Phi 4 Mini Instruct Dynamic Shape Model
This workflow performs quantization with Optimum Intel®. It performs the optimization pipeline:
- HuggingFace Model -> Quantized OpenVINO model -> Quantized encapsulated ONNX OpenVINO IR model
The flow in following config file executes the above workflow producing a dynamic shape model.
Install the necessary python packages:
python -m pip install olive-ai[openvino]The optimization techniques to run are specified in the relevant config json file.
Optimize the model using the following command:
olive run --config <config_file.json>Example:
olive run --config phi4_ov_config.jsonor run simply with python code:
from olive import run
workflow_output = run("<config_file.json>")After running the above command, the model candidates and corresponding config will be saved in the output directory.
To run ONNX OpenVINO IR Encapsulated GenAI models, please setup latest ONNXRuntime GenAI with ONNXRuntime OpenVINO EP support.
The sample chat app to run is found as model-chat.py in the onnxruntime-genai Github repository.
The sample command to run after all setup would be as follows:
python model-chat.py -e follow_config -v -g -m models/<model_folder>/model/Example:
python model-chat.py -e follow_config -v -g -m models/Phi-4-mini-instruct/model/