This tool quantizes Circle models using the GGML library.
gccinstalledflatc(FlatBuffers compiler) must be available- Set
FLATC_PATHifflatcis not in your PATH or standard build locations
The tool is structured as a Python package oquantize located in tools/oquantize.
It includes a C extension that needs to be compiled and generates circle.py from schema.
cd tools/oquantize
python3 setup.pyThis compiles libggml_quant.so from the GGML source files and generates circle.py.
To quantize a Circle model, run the oquantize package from the tools directory:
cd tools
# Usage: python -m oquantize <quant_type> <input_circle> <output_circle>
python3 -m oquantize q4_0 prefill.circle prefill.q4.circle
python3 -m oquantize q4_0 decode.circle decode.q4.circle| File | Original Size | Quantized Size | Reduction |
|---|---|---|---|
| prefill.circle | 18M | 2.7M | ~85% |
| decode.circle | 18M | 2.7M | ~85% |
(Note: significant reduction is observed due to FP32 -> Q4_0 quantization).
- Package Structure:
tools/oquantize/ - C Extension:
libggml_quant.socompiled fromggml-quants.c,ggml-aarch64.c, andggml.c - Quantization: Row-wise
GGML_Q4_0quantization forGATHER(input 0) andFULLY_CONNECTED(input 1) weights - Schema:
circle.pygenerated fromruntime/libs/circle-schema/circle_schema.fbsusingflatc --python --gen-object-api --gen-onefile