This repository demonstrates the optimization of the Qwen2.5-Coder-7B-Instruct model using post-training quantization (PTQ) techniques.
- OpenVINO for Intel® GPU/NPU
- This process uses OpenVINO specific passes like
OpenVINOOptimumConversion,OpenVINOIoUpdateandOpenVINOEncapsulation
- This process uses OpenVINO specific passes like
- ModelBuilder for NVIDIA TRT for RTX GPU
This workflow performs quantization with Optimum Intel®. It performs the optimization pipeline:
- HuggingFace Model -> Quantized OpenVINO model -> Quantized encapsulated ONNX OpenVINO IR model
Execute the provided inference_sample.ipynb notebook.