Commit 2da394e
committed
[ET-VK][Llama] Apply XNNPACK partitoner as well when lowering to Vulkan
## Context
The final logit linear layer in the Transformer architecture has extremely large tensors, since the output and weight tensors will have a tensor with dim equal to the vocabulary size, which may be extremely large. Because of this, image textures cannot be used to execute the op when running with the Vulkan delegate, so an implementation using buffer based tensors must be used.
Unfortunately, Vulkan does not have a performant implementation of linear with buffer based tensors at the moment. As a result, if this final linear layer is executed in Vulkan, model inference is extremely slow.
## Changes
The below diff will prevent the final logit linear layer from being delegated to Vulkan by enforcing a GPU buffer limit.
This diff modifies the export llama script to apply the XNNPACK partitioner after the Vulkan partitioner if lowering to Vulkan, to ensure that remaining ops will be accelerated with XNNPACK. 4 bit quantization will also apply an additional Quantizer after applying the Vulkan quantizer (which will skip the final logit linear layer) so that the final logit linear can be quantized as well.
## Long Term
This is a temporary measure while an optimized buffer based linear implementation is developed. Once the Vulkan implementation achieves parity with XNNPACK, the final logit linear will be delegated to Vulkan once more.
Differential Revision: [D65899827](https://our.internmc.facebook.com/intern/diff/D65899827/)
[ghstack-poisoned]1 parent c4a9de5 commit 2da394e
File tree
2 files changed
+14
-1
lines changed- examples/models/llama
- source_transformation
2 files changed
+14
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
682 | 682 | | |
683 | 683 | | |
684 | 684 | | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
685 | 689 | | |
686 | 690 | | |
687 | 691 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
161 | 170 | | |
162 | 171 | | |
163 | 172 | | |
| |||
0 commit comments