Skip to content

Fix grouped GEMM launch code to derive grid size from filtered kargs#5150

Draft
Copilot wants to merge 9 commits intodevelopfrom
copilot/sub-pr-4996
Draft

Fix grouped GEMM launch code to derive grid size from filtered kargs#5150
Copilot wants to merge 9 commits intodevelopfrom
copilot/sub-pr-4996

Conversation

Copy link

Copilot AI commented Mar 5, 2026

Kernel::MakeKargs(gemm_descs) skips groups where M/N/K==0, so it can return fewer entries than gemm_descs.size(). The grid size was computed via Kernel::GridSize(gemm_descs) which used the unfiltered count, causing the kernel to be launched over more blocks than there are valid kernel arguments.

Changes

  • Grid size: Kernel::GridSize(gemm_descs)dim3(kargs.empty() ? 0 : kargs.back().block_end, 1, 1) — derived from the filtered kargs vector, consistent with how hipMemcpyWithStream and group_count already used kargs.size()
// Before
const dim3 grids = Kernel::GridSize(gemm_descs);  // uses gemm_descs.size() — can exceed kargs

// After
const dim3 grids = dim3(kargs.empty() ? 0 : kargs.back().block_end, 1, 1);  // aligned with MakeKargs output

The hipMemcpyWithStream byte count and group_count kernel argument were already correctly using kargs.size().


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] Fix issues in grouped gemm implementation based on feedback Fix grouped GEMM launch code to derive grid size from filtered kargs Mar 5, 2026
@assistant-librarian assistant-librarian bot added the external contribution Code contribution from users community.. label Mar 5, 2026
Base automatically changed from users/tlakshma/ck/te_grouped_gemm to develop March 10, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external contribution Code contribution from users community..

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants