Update on "[ET-VK][ez] Add support for buffer backed qparams in int4 linear + add checks for physical limits when allocating"

SS-JIA · SS-JIA · commit 8794595b6f9a · 2025-04-09T11:44:35.000-07:00
## Context Currently, the groupwise quantized int4 linear op implementation forces the scales and zero tensor to be a `Texture3D`. However, for i.e. transformer models that have a logit linear layer, the image extents required may exceed the maximum image extents available on the device. ## Changes * Add support for the scales and zero tensor being a `Buffer` instead of a `Texture3D` * Add checks when allocating buffers or images for tensors that the requested resource fits within the physical device limits Differential Revision: [D72662176](https://our.internmc.facebook.com/intern/diff/D72662176/) [ghstack-poisoned]
diff --git a/backends/vulkan/runtime/graph/ops/glsl/pack_int4_linear_weight_transposed_interleaved.glsl b/backends/vulkan/runtime/graph/ops/glsl/pack_int4_linear_weight_transposed_interleaved.glsl
@@ -109,8 +109,8 @@ void main() {
       in_vals[r][0] = get_first(in_val_packed);
       in_vals[r][1] = get_second(in_val_packed);
     } else {
-      in_vals[r][0] = uint8_t(254);
-      in_vals[r][1] = uint8_t(254);
+      in_vals[r][0] = uint8_t(0);
+      in_vals[r][1] = uint8_t(0);
     }
   }
 

Original file line number	Diff line number	Diff line change
`@@ -109,8 +109,8 @@ void main() {`
`109`	`109`	`in_vals[r][0] = get_first(in_val_packed);`
`110`	`110`	`in_vals[r][1] = get_second(in_val_packed);`
`111`	`111`	`} else {`
`112`		`- in_vals[r][0] = uint8_t(254);`
`113`		`- in_vals[r][1] = uint8_t(254);`
	`112`	`+ in_vals[r][0] = uint8_t(0);`
	`113`	`+ in_vals[r][1] = uint8_t(0);`
`114`	`114`	`}`
`115`	`115`	`}`
`116`	`116`