Skip to content

Commit bf5f2f5

Browse files
committed
Update on "[ET-VK] Using shared memory offsetting in conv2d pw and saving ivec3 pos instead of ivec2 to improve performance."
This diff changes conv2d pw op shader to offset shared memory based on thread local index to improve performance. Change also saves pos as ivec3 pos instead of ivec2. Differential Revision: [D68400786](https://our.internmc.facebook.com/intern/diff/D68400786/) [ghstack-poisoned]
2 parents 702a755 + 38e02e9 commit bf5f2f5

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

backends/vulkan/runtime/graph/ops/glsl/conv2d_pw.glsl

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,9 @@ layout(push_constant) uniform restrict Block {
4343

4444
layout(local_size_x_id = 0, local_size_y_id = 1, local_size_z_id = 2) in;
4545

46-
// macro to offset shared memory access index. Padding position index by 1 offset per 16 positions avoidd bank access conflict and thus improves performance.
46+
// For performance improvement, reduce register usage by caching positions in shared memory.
47+
// Offset index by 1 every 16 points to avoid bank access conflict.
4748
#define offset_pos_index(index) (index + ((index) >> 4))
48-
49-
// shared memory to hold calculated positions, this would reduce register usage thus improving performance.
50-
// 64 is the number of threads in the local wg
5149
shared ivec3 pos_shared[offset_pos_index(LOCAL_WG_SIZE * TILE_SIZE_X * TILE_SIZE_Y)];
5250

5351
/*

0 commit comments

Comments
 (0)