Commit fb29875
ssjia
Update on "[ET-VK] Enable IntxWeightOnlyConfig"
## Motivation
Be able to test Vulkan lowering via optimum-executorch.
## Context
Very similar to the below PR, Int4 weight only quantization is currently enabled in Vulkan via a custom source transform quantizer that replaces linear layers with a custom linear layer that calls a custom weight only quantized linear op.
This diff aims to make it so that no Vulkan specific source transforms need to be applied by adding a fusion pattern for weight only quantized linear.
## Changes
* Introduce a fusable graph pattern for weight only quantized linear
* Add fusion logic for weight only quantized linear in the fuse patterns pass
* Add `4w` qmode to the export llama script
Differential Revision: [D80293302](https://our.internmc.facebook.com/intern/diff/D80293302/)
[ghstack-poisoned]File tree
0 file changed
+0
-0
lines changed0 file changed
+0
-0
lines changed
0 commit comments