Commit cde31aa
ssjia
[ET-VK] Improve q8 matmul by increasing TILE_N4
Title says it all! I found that the latency of executing int8 matmul can be improved by increases the output tile's N4 dimension to 2. The improvement is about 20-25% on Samsung Galaxy S25.
Differential Revision: [D83253129](https://our.internmc.facebook.com/intern/diff/D83253129/)
ghstack-source-id: 312106549
Pull Request resolved: #145971 parent 0c04519 commit cde31aa
File tree
3 files changed
+6
-2
lines changed- backends/vulkan/runtime/graph/ops
- glsl
- impl
3 files changed
+6
-2
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
| 78 | + | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
80 | 84 | | |
81 | 85 | | |
82 | 86 | | |
| |||
0 commit comments