You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Title says it all!
This PR adds implementations for int8 linear layers. Convolution is implemented in a later step, computing convolution as matrix multiplication via the im2col procedure.
For both linear and convolution, two versions are implemented:
1. `q8ta_q8csw` variant which quantized the input tensor and then performs integer accumulation via the int8 dot product extension
2. `q8csw` variant which dequantized the weight tensor in-shader and performs floating point accumulation.
The second one is needed to provide an alternative path for executing quantized models if the target GPU does not support int8 dot product extension.
These new ops are tested via the custom op testing + benchmarking framework introduced in the previous diff.
Differential Revision: [D81323424](https://our.internmc.facebook.com/intern/diff/D81323424/)
[ghstack-poisoned]
0 commit comments