Question about Triton GPU IR SharedEncoding #2026
-
There is very little documents about the I have a new MMA layout attribute for Intel XMX layout for lowering the tt.dot to Intel XMX engine. (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md#horizontal-packing-for-a-c-and-result) The convert layout will be decomposed from blocked->shared->mma in the optimization passes. Like:
I read the https://github.com/openai/triton/blob/5df904233c11a65bd131ead7268f84cca7804275/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td#L52. My question is how to understand the meaning of the |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
I cant help you ! |
Beta Was this translation helpful? Give feedback.
-
It means the elements in the tensor are in shared memory. And the mapping from each element of the tensor to shared memory addresses is represented by the swizzling parameters: |
Beta Was this translation helpful? Give feedback.
-
How is the layout characterized by the |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Can you explain more about to minimize the bank conflicts? |
Beta Was this translation helpful? Give feedback.
-
@zhanglx13 Thanks for the warm help. I think my question is clear now. |
Beta Was this translation helpful? Give feedback.
Say we have a 16 (M) by 16 (N) tensor A and each element is a f32. And we want to do swizzling along the N dim (row).
We want to swizzle the elements within each row when putting the elements in shared memory. Here is how the parameters control the swizzling behavior
perPhase
, which is calculated as perPhase = 128 / (elementsPerRow * elementTypeInBytes). In this example, perPhase = 128 / (16*4) = 2, which means every 2 rows have the same swizzling patternmaxPhase
means how many patterns in total do we want. This is usually set according to how shared memory is acces…