[LLVMGPU] Add ROCDLLoadToTransposeLoadPass to TileAndFuse pipeline #23317
+486
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds the ROCDLLoadToTransposeLoad pass to the LLVMGPUTileAndFuse pipeline. This is only enabled for ROCDL, and a test flag is added to turn off the feature if needed (mainly for benchmark testing).
Convolution Benchmark Results
The data below is all for bf16 convolutions. I didn't put comprehensive GEMM (
MxK @ KxN) data in a spreadsheet, but for GEMM, the speedup is in the range of0-17%for f16 and0-60%for i8 GEMMs.Full spreadsheet of results: https://docs.google.com/spreadsheets/d/1QEwemqviUzk4GginGdaDT7u8pP9x_Bku7r1aAvOUOCk/edit?usp=sharing
Weight Backward Convolutions
Equates to
KxM @ KxNGEMM layout.Benchmark Summary:
Total benchmarks: 162
Significant changes (>2.0%): 142
Improvements (transpose_load faster): 133
Regressions (default faster): 9
Mean % change: -20.44%
Range: -56.29% to 12.22%
Input Backward Convolutions
Equates to
MxK @ KxNGEMM layout.Benchmark Summary:
Total benchmarks: 146
Significant changes (>2.0%): 43
Improvements (transpose_load faster): 24
Regressions (default faster): 19
Mean % change: -1.09%
Range: -34.02% to 12.08%
Additional notes
ci-extra: test_torch