-
Notifications
You must be signed in to change notification settings - Fork 706
Commit a3a3d85
committed
Update on "[ET-VK][ez] Fix 8 bit linear compute shader dispatch"
## Context
Currently, for the `q_8w_linear` shader, both the texture and the buffer variants use the same global work group and local work group setting.
Specially, the global work group is set to `{out.numel(), 1, 1}` and the local work group is set to `{64, 1, 1}`.
However, I believe this results in a very poor memory re-use for the texture shader. In this configuration:
* Within a work group each invocation will be requesting a different row of A - 64 rows of A requested in total
* All work groups will be requesting the same row of B
* One work group will load 65 unique rows from A and B
Compare this to a local work group size of `{8, 8, 1}`
* Across the work group, 8 rows will be loaded from A and 8 rows will be loaded from B
* One work group will load 16 unique rows total from A and B
Evidently, there is better memory re-use in the latter work group as fewer unique rows are loaded.
## Changes
Modify the `q_8w_linear` shader to use `{8, 8, 1}` local wg if possible. If `M` is small, then instead use `{4, 16, 1}` or `{2, 32, 1}` to reduce the number of inactive invocations.
Differential Revision: [D71706489](https://our.internmc.facebook.com/intern/diff/D71706489/)
[ghstack-poisoned]File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedOpen diff view settings
Filter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedOpen diff view settings
0 commit comments