Super-Grouping code Question #3016

zh-plus · 2024-01-25T09:48:01Z

zh-plus
Jan 25, 2024

While studying the Triton Tutorial, I found the "Super-Grouping" code in the Matrix Multiplication section confusing.

Below is the code related to grouping:

# Program ID
pid = tl.program_id(axis=0)
# Number of program ids along the M axis
num_pid_m = tl.cdiv(M, BLOCK_SIZE_M)
# Number of programs ids along the N axis
num_pid_n = tl.cdiv(N, BLOCK_SIZE_N)
# Number of programs in group
num_pid_in_group = GROUP_SIZE_M * num_pid_n
# Id of the group this program is in
group_id = pid // num_pid_in_group
# Row-id of the first program in the group
first_pid_m = group_id * GROUP_SIZE_M
# If `num_pid_m` isn't divisible by `GROUP_SIZE_M`, the last group is smaller
group_size_m = min(num_pid_m - first_pid_m, GROUP_SIZE_M)
# *Within groups*, programs are ordered in a column-major order
# Row-id of the program in the *launch grid*
pid_m = first_pid_m + (pid % group_size_m)
# Col-id of the program in the *launch grid*
pid_n = (pid % num_pid_in_group) // group_size_m

I'm wondering the second last code:
pid_m = first_pid_m + (pid % group_size_m)

Intuitively, the code should be pid_m = first_pid_m + (pid % num_pid_in_group % group_size_m), correct?
How can we eliminate the num_pid_in_group to ensure that pid % group_size_m is equal to pid % num_pid_in_group % group_size_m?

I have found a combination that makes this equation invalid:

def cdiv(a, b):
    return (a + b - 1) // b


def get_pidmn(M, N, BLOCK_SIZE_M, BLOCK_SIZE_N, GROUP_SIZE_M, pid):
    num_pid_m = cdiv(M, BLOCK_SIZE_M)
    num_pid_n = cdiv(N, BLOCK_SIZE_N)
    num_pid_in_group = GROUP_SIZE_M * num_pid_n
    group_id = pid // num_pid_in_group
    first_pid_m = group_id * GROUP_SIZE_M
    group_size_m = min(num_pid_m - first_pid_m, GROUP_SIZE_M)

    test_pid_m = first_pid_m + (pid % num_pid_in_group % group_size_m)
    pid_m = first_pid_m + (pid % group_size_m)
    assert test_pid_m == pid_m
    
    pid_n = (pid % num_pid_in_group) // group_size_m

    print(pid_m, pid_n, sep=', ')


if __name__ == '__main__':
    get_pidmn(14, 18, 3, 2, 3, 30)

However, it breaks the requirement that BLOCK_SIZE_M, BLOCK_SIZE_N, GROUP_SIZE_M should be powers of 2.

Anyone can help proving the equality when fullfilling all the prerequisite?

gavin838 · 2024-09-09T02:20:09Z

gavin838
Sep 9, 2024

It‘s just a tiling. For example, the original shape of Mul is M=512,N=512,K=512, you can split 4 Blocks to mul. In each Block, shape is M=256,K=512,N=256, if device has enough resource, it does't need to split again.But when shape is big, the memory can't hold MK or KN data, so it need data exchange between register and ddr. So spliting M or N can solve above problems.
origin_shape:512x512x512(MKN)
first_tiling(Block):256x512x512
second_tiling(sub_Block):128x512x512
third_tiling:128128128(Block_sizeM/K/N)

0 replies

SeepingFragranceLock · 2025-08-04T05:07:07Z

SeepingFragranceLock
Aug 4, 2025

However, it breaks the requirement that BLOCK_SIZE_M, BLOCK_SIZE_N, GROUP_SIZE_M should be powers of 2.

Anyone can help proving the equality when fullfilling all the prerequisite?

Both formulae are valid, but may get different row index for the last group.
However, block order does not matter for final result, since they are independent.
For efficiency, pid_m = first_pid_m + (pid % group_size_m) is better choice.

Check this setting:

M = 100
N = 100
BLOCK_SIZE_M = 8
BLOCK_SIZE_N = 8
GROUP_SIZE_M = 5
pid_total = cdiv(M, BLOCK_SIZE_M) * cdiv(N, BLOCK_SIZE_N)
print(f"pid_total: {pid_total}")
for i in range(120, pid_total):
    print(f"pid: {i}")
    get_pidmn(M, N, BLOCK_SIZE_M, BLOCK_SIZE_N, GROUP_SIZE_M, i)

Starting from pid = 130, it will get different row index.

0 replies

gavin838 · 2025-08-04T05:07:40Z

gavin838
Aug 4, 2025

收到!谢谢

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Super-Grouping code Question #3016

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Super-Grouping code Question #3016

Uh oh!

zh-plus Jan 25, 2024

Replies: 3 comments

Uh oh!

gavin838 Sep 9, 2024

Uh oh!

SeepingFragranceLock Aug 4, 2025

Uh oh!

gavin838 Aug 4, 2025

zh-plus
Jan 25, 2024

gavin838
Sep 9, 2024

SeepingFragranceLock
Aug 4, 2025

gavin838
Aug 4, 2025