Commit 371cc37
committed
Update on "Add 16A8W quantization configuration utility for ARM backend"
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.
## Key Changes
**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**
## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.
exported-using-ghexport
bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks
Differential Revision: [D81550512](https://our.internmc.facebook.com/intern/diff/D81550512/)
[ghstack-poisoned]1 parent cd8d1fb commit 371cc37
File tree
0 file changed
+0
-0
lines changed0 file changed
+0
-0
lines changed
0 commit comments