Update base for Update on "[ET-VK][AOT] Enable exporting Q8 Quantized Linear + Convolution"

ssjia · ssjia · commit 066f34b11423 · 2025-08-29T17:33:17.000-07:00
As title. Introduce fusion patterns to enable fusing quantized convolution and linear graph patterns into a custom op. ## Changes Introduce the concept of using custom pattern detection functions to detect graph patterns rather than solely relying on SubgraphMatcher. The issue with SubgraphMatcher is that a large number of graph patterns may need to be exported to obtain variants for different combinations of decompositions/quantization workflows. Having a custom detection function improves maintainability. Implement detection + replacement functions for quantized linear and quantized conv2d. Differential Revision: [D81323425](https://our.internmc.facebook.com/intern/diff/D81323425/) [ghstack-poisoned]
diff --git a/backends/vulkan/runtime/graph/ops/impl/QuantizedConvolution.cpp b/backends/vulkan/runtime/graph/ops/impl/QuantizedConvolution.cpp
@@ -478,6 +478,8 @@ void conv2d_q8csw_linear_tiled_impl(
   const ValueRef padding = args.at(idx++);
   const ValueRef dilation = args.at(idx++);
   const ValueRef groups = args.at(idx++);
+  const ValueRef orig_OC = args.at(idx++);
+  (void)orig_OC;
   const ValueRef output = args.at(idx++);
 
   const ValueRef packed_weight = prepack_q8_linear_weight(graph, weight);
@@ -552,6 +554,8 @@ void conv2d_q8ta_q8csw_linear_tiled_impl(
   const ValueRef padding = args.at(idx++);
   const ValueRef dilation = args.at(idx++);
   const ValueRef groups = args.at(idx++);
+  const ValueRef orig_OC = args.at(idx++);
+  (void)orig_OC;
   const ValueRef output = args.at(idx++);
 
   const ValueRef packed_weight = prepack_q8_linear_weight(graph, weight);