You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arm backend: Make per-channel quantization default (#11873)
Support for per-channel quantization was recently added to the Arm
backend. This patch changes the default setting to use per-channel
quantization for weights in convolutional and linear layers, instead of
per-tensor quantization, which was the previous default.
The reason for this change is that per-channel quantization offers
better numerical accuracy for models containing convolutional and/or
fully connected layers. Unless there is an explicit limitation in the
use case that prevents the use of per-channel quantization, it is
generally preferred.
The option to set quantization granularity can still be manually set
using `get_symmetric_quantization_config(is_per_channel=False)`. This
patch only changes the default.
Unit and model tests are affected by this change. Error tolerances for
those tests have not been changed, as model outputs are compared against
a reference that uses the exact same quantization strategy. That is, if
a model output is altered by this patch, the reference it is compared
against would also be altered accordingly.
To verify the impact of this change in terms of top-1 and top-5
accuracy, a manual test was run on MobileNetV2. The results show a
noticeable improvement:
- Per-tensor quantization Top-1 / Top-5 accuracy: 66.45% / 87.50%
- Per-channel quantization Top-1 / Top-5 accuracy: 70.85% / 89.50%
Signed-off-by: Martin Lindström <[email protected]>
Signed-off-by: Oscar Andersson <[email protected]>
Co-authored-by: Martin Lindström <[email protected]>
Co-authored-by: Sebastian Larsson <[email protected]>
0 commit comments