-
Notifications
You must be signed in to change notification settings - Fork 15k
[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
amd-eochoalo
merged 13 commits into
llvm:main
from
amd-eochoalo:eochoa/2025-09-19/cvt-amd-gpu
Oct 17, 2025
+198
−1
Merged
Changes from 10 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
9c09c35
[mlir][amdgpu] Add scaled_ext_packed{8,16} operations
amd-eochoalo f8b11c4
Use TypesMatchWith and make the scale a constant type
amd-eochoalo e71f8d8
Add note about availability on gfx1250+
amd-eochoalo 4f83cd9
Add verifier for blockSize and firstScaleByte
amd-eochoalo b7763ef
Use ConfinedType
amd-eochoalo d50b6fe
Only use AllOfType
amd-eochoalo 3cdb174
Verify shape matches and better type constraint
amd-eochoalo 30dcfea
Added scale type to the assembly format
amd-eochoalo 67261e5
Use functional-type
amd-eochoalo fadf035
Use : source_ty, scale_ty -> res_ty
amd-eochoalo 0383100
Adds examples and better verification
amd-eochoalo 7aa6169
no else after return and remove global resolution
amd-eochoalo 7a5fea5
indentation and syntax highlighting
amd-eochoalo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -112,6 +112,69 @@ def AMDGPU_ExtPackedFp8Op : | |
| }]; | ||
| } | ||
|
|
||
| def IsValidBlockSize: AttrConstraint< | ||
| CPred<"::llvm::is_contained({16, 32}, ::llvm::cast<::mlir::IntegerAttr>($_self).getInt())">, | ||
| "whose value is 16 or 32">; | ||
|
|
||
| def AMDGPU_ScaledExtPacked816Op | ||
| : AMDGPU_Op<"scaled_ext_packed816", [Pure, AllShapesMatch<["source", "res"]>]>, | ||
| Arguments<( | ||
| ins AnyTypeOf<[FixedVectorOfShapeAndType<[8], F4E2M1FN>, | ||
| FixedVectorOfShapeAndType<[8], F8E4M3FN>, | ||
| FixedVectorOfShapeAndType<[8], F8E5M2>, | ||
| FixedVectorOfShapeAndType<[16], F6E2M3FN>, | ||
| FixedVectorOfShapeAndType<[16], F6E3M2FN>]>:$source, | ||
| FixedVectorOfShapeAndType<[4], F8E8M0FNU>:$scale, | ||
| ConfinedAttr<I32Attr, [IsValidBlockSize]>:$blockSize, | ||
| ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<1>]>:$firstScaleLane, | ||
| ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<2>]>:$firstScaleByte)>, | ||
| Results<( | ||
| outs AnyTypeOf<[FixedVectorOfShapeAndType<[8], F32>, | ||
| FixedVectorOfShapeAndType<[8], F16>, | ||
| FixedVectorOfShapeAndType<[8], BF16>, | ||
| FixedVectorOfShapeAndType<[16], F32>, | ||
| FixedVectorOfShapeAndType<[16], F16>, | ||
| FixedVectorOfShapeAndType<[16], BF16>]>:$res)> { | ||
|
|
||
| let summary = "Extend a vector of packed floating point values"; | ||
|
|
||
| let description = [{ | ||
| The scales applied to the input microfloats are stored in two bytes which | ||
| come from the `scales` input provided in a *half* of the wave identified | ||
| by `firstScaleLane`. The pair of bytes used is selected by | ||
| `firstScaleByte`. The 16 vectors in consecutive lanes starting from | ||
| `firstScaleLane` (which we'll call the scale vectors) will be used by both | ||
| halves of the wave (with lane L reading from L % 16'th scale vector), but | ||
| each half will use a different byte. | ||
|
|
||
| When the block size is 32, `firstScaleByte` can be either 0 or 2, | ||
| selecting halves of the scale vectors. Lanes 0-15 will read from | ||
| `firstScaleByte` and lanes 16-31 will read from `firstScaleByte` + 1. | ||
|
|
||
| However, when the block size is 16, `firstScaleByte` can be 0 or 1. | ||
| Lanes 0-15 read from the `firstScaleByte`th element of the scale vectors, | ||
| while lanes 16-31 read from `firstScaleByte` + 2. | ||
|
|
||
| Note: the layout for the scales generally mirrors how the WMMA | ||
| instructions use for matix scales. These selection operands allows | ||
| one to choose portions of the matrix to convert. | ||
kuhar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Available on gfx1250+. | ||
| }]; | ||
|
|
||
| let assemblyFormat = [{ | ||
| attr-dict $source | ||
| `scale` `(` $scale `)` | ||
| `blockSize` `(` $blockSize `)` | ||
| `firstScaleLane` `(` $firstScaleLane`)` | ||
| `firstScaleByte` `(` $firstScaleByte `)` | ||
| `:` type($source) `,` type($scale) `->` type($res) | ||
| }]; | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a verifier that errors out on invalid block size / firstScaleByte combinations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review! 4f83cd9 |
||
| let hasVerifier = 1; | ||
|
|
||
| } | ||
|
|
||
| def AMDGPU_ScaledExtPackedOp | ||
| : AMDGPU_Op<"scaled_ext_packed", [Pure]>, | ||
| Arguments<( | ||
|
|
@@ -860,7 +923,7 @@ def AMDGPU_MFMAOp : | |
| based on the provided `m`, `k`, `n`, and `nBlks` attributes, along with the | ||
| types of the source and destination arguments. | ||
|
|
||
| For information on the layouts of the input and output matrces (which are stored | ||
| For information on the layouts of the input and output matrices (which are stored | ||
| in `sourceA`, `sourceB`, `destC`, and `destD`), see the CDNA ISA documentation. | ||
|
|
||
| The `cbsz`, `abid`, and `blgp` parameters control how the lanes of the wave | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.