Skip to content

Conversation

@DaniAffCH
Copy link
Contributor

@DaniAffCH DaniAffCH commented Jul 7, 2024

This PR introduces a Python tool for block quantizing ONNX models. The quantized models adhere to the ONNX standard, verified using onnx.checker.check_model(self.model, full_check=True).

Additionally, these block-quantized models are compatible with the QuantizeLinear and DequantizeLinear layers in OpenCV, as introduced in opencv/opencv#25644, allowing them to be executed within the OpenCV DNN engine.

The tool currently performs asymmetric weight-only quantization on Convolutional Layers. It's possible to specify the desired quantization block size.
The quantization is applied along axis 1, flattening convolutional weights $[C_{out}, C_{in}, K_w, K_h] \rightarrow [C_{out}, C_{in} \times K_w \times K_h]$.
Future enhancements could extend the tool capabilities, making it more customizable and general.

The tool also provides a quantization summary, reporting the overall quantization mean squared error and the initial and final model size.

image

Testing

When employing a block size of 16 and normalized input images, the mean squared quantization error was found to be in the order of magnitude of $10^{-2}$ or $10^{-3}$.

Furthermore, a qualitative assessment was conducted by quantizing some models of this repository in a blockwise manner and executing them, comparing the results with the original model and int8 model.

The findings indicate that, with even block size, the block quantized model maintains performance levels equivalent to those of the original model while achieving a reduction in model size.

Here is an example applied to face detection yunet:
Loading the following gif may take some time because of the gif size

anim-opt

Onnxruntime and DNN comparison

The resulting models have been tested using both onnxruntime and OpenCV DNN, both of which produced identical outputs for the same input data.
Since onnxruntime introduced the support for blockwise quantization inference recently and such functionality has not been included in the last release, the only way to test it is to build onnxruntime from the source.

./build.sh --config=Release --build_shared_lib --disable_ml_ops --build_wheel --enable_reduced_operator_type_support --skip_tests --parallel

Then install with pip the resulting wheel:

pip install -U build/Linux/Release/dist/onnxruntime-1.19.0-cp39-cp39-linux_x86_64.whl

To test the resulting networks with OpenCV DNN you have to build the pull request opencv/opencv#25644

@fengyuentau fengyuentau self-assigned this Jul 16, 2024
@fengyuentau fengyuentau added feature New feature or request quantization Anything related to model quantization labels Jul 16, 2024
@fengyuentau fengyuentau added this to the 4.10.0 milestone Jul 16, 2024
Copy link
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍 I propose to merge this one after opencv/opencv#25644.

@fengyuentau fengyuentau merged commit ac83ef3 into opencv:main Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request quantization Anything related to model quantization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants