You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update on "[ET-VK] Using shared memory offsetting in conv2d pw and saving ivec3 pos instead of ivec2 to improve performance."
This diff changes conv2d pw op shader to offset shared memory based on thread local index to improve performance. Change also saves pos as ivec3 pos instead of ivec2.
Differential Revision: [D68400786](https://our.internmc.facebook.com/intern/diff/D68400786/)
[ghstack-poisoned]
// macro to offset shared memory access index. Padding position index by 1 offset per 16 positions avoidd bank access conflict and thus improves performance.
46
+
// For performance improvement, reduce register usage by caching positions in shared memory.
47
+
// Offset index by 1 every 16 points to avoid bank access conflict.
0 commit comments