-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. #145395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2009ede
50d19a6
087046a
fa30258
4259f63
c8157f0
bbb57ea
60e2c56
9bba79f
b5b4e6f
207f2f4
db9b837
32f0edf
c13aec2
94f73d5
1f03a6d
4c3c64f
c9ca046
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -898,6 +898,27 @@ def AMDGPU_GatherToLDSOp : | |
| let hasVerifier = 1; | ||
| } | ||
|
|
||
| def AMDGPU_TransposeLoadOp : | ||
| AMDGPU_Op<"transpose_load", [SameVariadicOperandSize]>, | ||
| Arguments<(ins Arg<AnyMemRef, "buffer to transpose load from", [MemRead]>:$src, Variadic<Index>:$srcIndices)>, | ||
| Results<(outs MFMAInTypes:$dst)> { | ||
| let summary = "MLIR wrapper for CDNA Transpose Load instructions"; | ||
| let description = [{ | ||
| The `amdgpu.transpose_load` op is a wrapper around the `ds_read_tr` instructions. | ||
|
|
||
| Operands: | ||
| * `$src`: LDS memref to read from. | ||
| * `$srcIndices`: indices into `$src` to read from for this thread. | ||
| * `$dst`: target register this transpose load instruction will write to. | ||
|
|
||
| Note: Lowering is only supported on gfx950 and up. | ||
lialan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }]; | ||
| let assemblyFormat = [{ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know other ops here don't provide examples, but I think it would be worth adding going forward -- I rely on these all the time
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like your idea. So I tried to add a very simple example to show the format of the op. In terms of the semantics of the instruction, it is too hard to explain in a few sentences so I wrote that "please refer to the actual document for detailed explanation".
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably call out that you mean the CDNA4 ISA manual |
||
| $src `[` $srcIndices `]` attr-dict `:` type($src) `->` type($dst) | ||
| }]; | ||
| let hasVerifier = 1; | ||
| } | ||
|
|
||
| def AMDGPU_ScaledMFMAOp : | ||
| AMDGPU_Op<"scaled_mfma", [AllTypesMatch<["destC", "destD"]>, | ||
| Pure]>, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| // RUN: mlir-opt %s -convert-amdgpu-to-rocdl=chipset=gfx950 | FileCheck %s | ||
lialan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| #gpu_lds_addrspace = 3 | ||
| #amdgpu_fat_buffer_addrspace = 7 | ||
|
|
||
| // CHECK-LABEL: func @transpose_load_to_rocdl_4xf16 | ||
| func.func @transpose_load_to_rocdl_4xf16(%idx1 : index, %idx2 : index, %wgmem : memref<128x72xf16, #gpu_lds_addrspace>) -> vector<4xf16> { | ||
| // CHECK: rocdl.ds.read.tr16.b64 | ||
| %0 = amdgpu.transpose_load %wgmem[%idx1, %idx2] : memref<128x72xf16, #gpu_lds_addrspace> -> vector<4xf16> | ||
| return %0 : vector<4xf16> | ||
| } | ||
|
|
||
| // CHECK-LABEL: func @transpose_load_to_rocdl_8xi8 | ||
| func.func @transpose_load_to_rocdl_8xi8(%idx1 : index, %idx2 : index, %wgmem : memref<128x128xi8, #gpu_lds_addrspace>) -> vector<8xi8> { | ||
| // CHECK: rocdl.ds.read.tr8.b64 | ||
| %0 = amdgpu.transpose_load %wgmem[%idx1, %idx2] : memref<128x128xi8, #gpu_lds_addrspace> -> vector<8xi8> | ||
| return %0 : vector<8xi8> | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.