You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[XPU][TritonGPUToLLVM] Avoid bank conflicts in sub-group transposes
- Store the whole matrix using SIMD block stores for each row leaving
a single garbage item at the end of the row so each row has
`sub_group_size + 1` elements
- Load each row with vector loads
By introducing this garbage item at the end of each row, we ensure matrix
loading avoid bank conflicts as the offset between the position loaded by
work-item `i` and `i+j` is `N * (sub_group_size + 1)` (assuming `sub_group_size`
banks).
Signed-off-by: victor-eds <[email protected]>
0 commit comments