Skip to content

Conversation

@victor-eds
Copy link
Contributor

Extend sub-group transposition support allowing N*sub_group_size elements per thread.

As per block load semantics (matrix of sub_group_size columns), we need N vector loads to load the transposed matrix from local memory.

…near layout

Detect sub-group transpose cases as those in which warp and lane dimensions get swapped
and no transfer within block-groups is needed. Use sub-group write operations to store
the contents in local memory and vector operations to write back. These will be translated
to non-transposed and transposed store and loads respectively. As data is moved within
sub-groups, no barriers are needed.

For now, handle only the case of a `single sub_group_size^2` block being transposed.

This may be split in the future by performing `N*M` iterations for matrices of size
`N*sub_group_sizexM*sub_group_size`.

Signed-off-by: victor-eds <[email protected]>
Extend sub-group transposition support allowing `N*sub_group_size` elements per thread.

As per block load semantics (matrix of `sub_group_size` columns), we need `N` vector loads
to load the transposed matrix from local memory.

Signed-off-by: victor-eds <[email protected]>
@victor-eds victor-eds requested review from a team, etiotto and whitneywhtsang October 21, 2024 14:49
@victor-eds victor-eds self-assigned this Oct 21, 2024
@victor-eds victor-eds changed the title Sub group slm transpose extend [TritonIntelGPUToLLVM] Extend sub-group transposition support Oct 21, 2024
@victor-eds
Copy link
Contributor Author

Review only 602175b

@victor-eds
Copy link
Contributor Author

Part of #2266.

@victor-eds
Copy link
Contributor Author

Requires #2511 to be merged.

@etiotto etiotto linked an issue Oct 21, 2024 that may be closed by this pull request
@victor-eds victor-eds enabled auto-merge (squash) October 24, 2024 12:11
@victor-eds victor-eds merged commit 0514102 into intel:main Oct 24, 2024
4 checks passed
@victor-eds victor-eds deleted the sub-group-slm-transpose-extend branch October 24, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port "sub-group transpose reduction" to default path

3 participants