Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Qwen2.5-Coder-7B-Instruct Model Optimization

This repository demonstrates the optimization of the Qwen2.5-Coder-7B-Instruct model using post-training quantization (PTQ) techniques.

  • OpenVINO for Intel® GPU/NPU
    • This process uses OpenVINO specific passes like OpenVINOOptimumConversion, OpenVINOIoUpdate and OpenVINOEncapsulation
  • ModelBuilder for NVIDIA TRT for RTX GPU

Intel® Workflows

This workflow performs quantization with Optimum Intel®. It performs the optimization pipeline:

  • HuggingFace Model -> Quantized OpenVINO model -> Quantized encapsulated ONNX OpenVINO IR model

Inference

Run Console-Based Chat Interface

Execute the provided inference_sample.ipynb notebook.