You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "[ET-VK] Add kInt8x4 dtype and GPUMemoryLayouts for packed quantized tensors"
## Motivation
Lay the foundations for being able to execute statically quantized CNNs with ET-VK. Unlike with dynamic quantization, static quantization allows the output of quantized operators to stay in integer representation and be fed directly to the next quantized operator.
## Context
Typically, int8 quantized tensors can be represented by simply having the tensor use the int8 data type. While this is possible in ET-VK, in practice quantized operators expect int8 quantized tensors to be packed so that 16 8-bit values are packed into each `ivec4`, such that quantized int8 tensors will load/store with a granularity of 16 elements.
The reason for this is twofold:
* Support for shader int8 / storage buffer int8 extension is not guaranteed, meaning some devices do not allow using int8 types in shaders
* We have found that load/store from storage buffers/textures that use int8 data types sometimes results in worse memory load performance, due to vectorized load/store instructions not being used.
Therefore, in ET-VK we need a way to mark that a quantized tensor should
1. Use int32 as the underlying data type for the storage buffer/texture
2. Account for the block-packing that may be used
## Changes
First, introduce the `Int8x4` dtype that can be used for packed int8 tensors. This dtype is functionally the same as `Int`, but denotes that each int32 actually contains 4 packed 8-bit values.
Second, introduce new memory layouts: `kPackedInt8_4W4C` and `kPackedInt8_4H4W`. The former will be used for convolution, whil the latter will be used for matrix multiplication. See the inline comments for more details about these memory layouts.
Then, update `QuantizedConvolution.cpp` and `QuantizedLinear.cpp` to use the new data type and memory layouts for the packed int8 input tensor.
Differential Revision: [D82542336](https://our.internmc.facebook.com/intern/diff/D82542336/)
[ghstack-poisoned]
"text": "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.",
211
+
}
212
+
],
213
+
},
214
+
{
215
+
"role": "user",
216
+
"content": [
217
+
{"type": "image", "url": image_url},
218
+
{
219
+
"type": "text",
220
+
"text": "What are the things I should be cautious about when I visit here?",
0 commit comments