Skip to content

Commit 9e016d8

Browse files
committed
Initial changes for Gemma3 support for QNN
1 parent 99ca6a3 commit 9e016d8

14 files changed

+1946
-0
lines changed

google-gemma/qnn/README.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Gemma-3-4B Model Optimization
2+
3+
This repository demonstrates the optimization of the [Google Gemma-3-4B](https://huggingface.co/google/gemma-3-4b-it) model using **post-training quantization (PTQ)** techniques for QNN (Qualcomm Neural Network) execution. The optimization process utilizes an environment based heavily upon the [PTQ tutorial for Phi-3.5](https://github.com/CodeLinaro/olive-recipes/blob/main/microsoft-Phi-3.5-mini-instruct/aitk/README.md)
4+
5+
## File Overview
6+
7+
This example contains the following key files:
8+
9+
- **`env_setup.sh`** - Automated environment setup script (Linux only)
10+
- **`gemma3-4b-text-qnn-config.json`** - Olive configuration for optimizing the text component
11+
- **`gemma3-4b-vision-qnn-config.json`** - Olive configuration for optimizing the vision component
12+
- **`user_script.py`** - Dataset handling and preprocessing utilities
13+
- **`custom_gemma3_4b_it_vision.py`** - Vision model loader for the optimization pipeline
14+
15+
## Prerequisites
16+
17+
### System Requirements
18+
- **Operating System**: Linux (automated setup script is Linux-only)
19+
- **Python**: 3.10
20+
- **Package Manager**: [uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods)
21+
- **Storage**: ~13GB for COCO train2017 dataset (downloaded automatically)
22+
23+
### Dependencies Installed by Setup Script
24+
The `env_setup.sh` script installs the following components:
25+
- setuptools (for building Olive from source)
26+
- Olive requirements and dependencies
27+
- AutoGPTQ (from source)
28+
- GPTQModel (specific commit: `558449bed3ef2653c36041650d30da6bbbca440d`)
29+
- onnxruntime-qnn (pre-release version)
30+
31+
## Setup Instructions
32+
33+
### Automated Setup (Recommended)
34+
```bash
35+
source env_setup.sh
36+
```
37+
38+
### Manual Setup (Alternative)
39+
If you prefer to set up manually or need to troubleshoot:
40+
41+
1. Install setuptools:
42+
```bash
43+
uv pip install setuptools
44+
```
45+
46+
2. Install requirements:
47+
```bash
48+
uv pip install -r ../requirements.txt
49+
uv pip install -r ../../../requirements.txt
50+
```
51+
52+
3. Install AutoGPTQ from source:
53+
```bash
54+
export BUILD_CUDA_EXT=0
55+
uv pip install --no-build-isolation git+https://github.com/PanQiWei/AutoGPTQ.git
56+
```
57+
58+
4. Install GPTQModel with Gemma3 fix:
59+
```bash
60+
uv pip install --no-build-isolation git+https://github.com/ModelCloud/GPTQModel.git@558449bed3ef2653c36041650d30da6bbbca440d
61+
```
62+
63+
5. Install onnxruntime-qnn:
64+
```bash
65+
uv pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt
66+
uv pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps
67+
```
68+
69+
> **Important:** The setup uses a specific commit hash for GPTQModel (`558449bed3ef2653c36041650d30da6bbbca440d`) to address a [memory leak issue](https://github.com/ModelCloud/GPTQModel/commit/558449bed3ef2653c36041650d30da6bbbca440d) with Gemma3 models.
70+
71+
## Optimization Process
72+
73+
Since Gemma-3-4B is a multi-modal model composed of both vision and text components, the strategy for optimizing it through Olive is to operate on the constituent models separately before configuring them to work together at the onnxruntime-genai stage.
74+
75+
### Configuration Differences
76+
77+
**Text Configuration (`gemma3-4b-text-qnn-config.json`)**:
78+
- Uses HuggingFace model directly (`google/gemma-3-4b-it`)
79+
- Applies comprehensive optimization pipeline: QuaRot → GptqModel → ModelBuilder → Quantization
80+
- Outputs to: `models/gemma-3-4b-it-text/`
81+
82+
**Vision Configuration (`gemma3-4b-vision-qnn-config.json`)**:
83+
- Uses custom PyTorch model loader (`custom_gemma3_4b_it_vision.py`)
84+
- Simpler pipeline: ONNX Conversion → Graph Surgery → Quantization
85+
- Outputs to: `models/gemma-3-4b-it-vision/`
86+
87+
### Running Optimization
88+
89+
Execute the following commands to separately produce optimized binaries for each component:
90+
91+
```bash
92+
olive run --config gemma3-4b-text-qnn-config.json
93+
```
94+
95+
```bash
96+
olive run --config gemma3-4b-vision-qnn-config.json
97+
```
98+
99+
## Expected Outputs
100+
101+
After successful optimization, you will find:
102+
103+
- **Text model outputs**: `models/gemma-3-4b-it-text/`
104+
- **Vision model outputs**: `models/gemma-3-4b-it-vision/`
105+
- **Cache directory**: `cache/` (intermediate files and downloaded datasets)
106+
- **Dataset**: `.cache/train2017/` (COCO train2017 images, ~13GB)
107+
108+
Both configurations use `"no_artifacts": true`, meaning only the final optimized models are retained.
109+
110+
## Troubleshooting
111+
112+
### Common Issues
113+
114+
**Insufficient Storage**: The COCO train2017 dataset requires ~13GB of storage and is downloaded automatically to `.cache/train2017/`.
115+
116+
**Memory Requirements**: The optimization process, particularly for the text model with its comprehensive pipeline, requires substantial memory.
117+
118+
**QNN Provider**: Ensure the QNNExecutionProvider is properly installed and configured in your environment.
119+
120+
**Platform Limitation**: The current setup script is designed for Linux only. Windows/macOS users will need to adapt the manual setup steps.
121+
122+
**Dataset Download**: If the COCO dataset download fails, check your internet connection and available storage. The script uses `wget` which must be available on your system.

0 commit comments

Comments
 (0)