Replies: 3 comments
-
It‘s just a tiling. For example, the original shape of Mul is M=512,N=512,K=512, you can split 4 Blocks to mul. In each Block, shape is M=256,K=512,N=256, if device has enough resource, it does't need to split again.But when shape is big, the memory can't hold MK or KN data, so it need data exchange between register and ddr. So spliting M or N can solve above problems. |
Beta Was this translation helpful? Give feedback.
-
Both formulae are valid, but may get different row index for the last group. Check this setting:
Starting from |
Beta Was this translation helpful? Give feedback.
-
收到!谢谢
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
While studying the Triton Tutorial, I found the "Super-Grouping" code in the Matrix Multiplication section confusing.
Below is the code related to grouping:
I'm wondering the second last code:
pid_m = first_pid_m + (pid % group_size_m)
Intuitively, the code should be
pid_m = first_pid_m + (pid % num_pid_in_group % group_size_m)
, correct?How can we eliminate the
num_pid_in_group
to ensure that pid % group_size_m is equal to pid % num_pid_in_group % group_size_m?I have found a combination that makes this equation invalid:
However, it breaks the requirement that BLOCK_SIZE_M, BLOCK_SIZE_N, GROUP_SIZE_M should be powers of 2.
Anyone can help proving the equality when fullfilling all the prerequisite?
Beta Was this translation helpful? Give feedback.
All reactions