-
Notifications
You must be signed in to change notification settings - Fork 15k
[mlir][amdgpu] Add scaled_ext_packed{8,16} operations #159830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
9c09c35
f8b11c4
e71f8d8
4f83cd9
b7763ef
d50b6fe
3cdb174
30dcfea
67261e5
fadf035
0383100
7aa6169
7a5fea5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -112,6 +112,72 @@ def AMDGPU_ExtPackedFp8Op : | |
| }]; | ||
| } | ||
|
|
||
| def IsValidBlockSize: AttrConstraint< | ||
| CPred<"::llvm::cast<::mlir::IntegerAttr>($_self).getInt() == 16 || ::llvm::cast<::mlir::IntegerAttr>($_self).getInt() == 32">, | ||
| "whose value is 16 or 32">; | ||
|
|
||
| def AMDGPU_ScaledExtPacked816Op | ||
| : AMDGPU_Op<"scaled_ext_packed816", [Pure, TypesMatchWith<"scale type is fixed", | ||
| "source", "scale", | ||
| "ScaledExtPacked816Op::getScaleType($_self.getContext())">]>, | ||
| Arguments<( | ||
| ins AnyTypeOf<[VectorOfLengthAndType<[8], [F4E2M1FN,F8E4M3FN,F8E5M2]>, | ||
| VectorOfLengthAndType<[16], [F6E2M3FN, F6E3M2FN]>]>:$source, | ||
| FixedVectorOfLengthAndType<[4], [F8E8M0FNU]>:$scale, | ||
| ConfinedAttr<I32Attr, [IsValidBlockSize]>:$blockSize, | ||
| ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<1>]>:$firstScaleLane, | ||
| ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<2>]>:$firstScaleByte)>, | ||
| Results<( | ||
| outs AnyTypeOf<[FixedVectorOfLengthAndType<[8], [F32]>, | ||
| FixedVectorOfLengthAndType<[8], [F16]>, | ||
| FixedVectorOfLengthAndType<[8], [BF16]>, | ||
| FixedVectorOfLengthAndType<[16], [F32]>, | ||
| FixedVectorOfLengthAndType<[16], [F16]>, | ||
| FixedVectorOfLengthAndType<[16], [BF16]>]>:$res)> { | ||
|
|
||
| let summary = "Extend a vector of packed floating point values"; | ||
|
|
||
| let description = [{ | ||
| The scales applied to the input microfloats are stored in two bytes which | ||
| come from the `scales` input provided in a *half* of the wave identified | ||
| by `firstScaleLane`. The pair of bytes used is selected by | ||
| `firstScaleByte`. The 16 vectors in consecutive lanes starting from | ||
| `firstScaleLane` (which we'll call the scale vectors) will be used by both | ||
| halves of the wave (with lane L reading from L % 16'th scale vector), but | ||
| each half will use a different byte. | ||
|
|
||
| When the block size is 32, `firstScaleByte` can be either 0 or 2, | ||
| selecting halves of the scale vectors. Lanes 0-15 will read from | ||
| `firstScaleByte` and lanes 16-31 will read from `firstScaleByte` + 1. | ||
|
|
||
| However, when the block size is 16, `firstScaleByte` can be 0 or 1. | ||
| Lanes 0-15 read from the `firstScaleByte`th element of the scale vectors, | ||
| while lanes 16-31 read from `firstScaleByte` + 2. | ||
kuhar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Note: the layout for the scales generally mirrors how the WMMA | ||
| instructions use for matix scales. These selection operands allows | ||
| one to choose portions of the matrix to convert. | ||
kuhar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Available on gfx1250+. | ||
| }]; | ||
|
|
||
| let assemblyFormat = [{ | ||
| attr-dict $source | ||
| `scale` `(` $scale `)` | ||
| `blockSize` `(` $blockSize `)` | ||
| `firstScaleLane` `(` $firstScaleLane`)` | ||
| `firstScaleByte` `(` $firstScaleByte `)` | ||
| `:` type($source) `to` type($res) | ||
kuhar marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| }]; | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a verifier that errors out on invalid block size / firstScaleByte combinations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review! 4f83cd9 |
||
| let extraClassDeclaration = [{ | ||
| static Type getScaleType(MLIRContext *ctx) { | ||
|
||
| return VectorType::get(4, Float8E8M0FNUType::get(ctx)); | ||
| } | ||
| }]; | ||
|
|
||
| } | ||
|
|
||
| def AMDGPU_ScaledExtPackedOp | ||
| : AMDGPU_Op<"scaled_ext_packed", [Pure]>, | ||
| Arguments<( | ||
|
|
@@ -860,7 +926,7 @@ def AMDGPU_MFMAOp : | |
| based on the provided `m`, `k`, `n`, and `nBlks` attributes, along with the | ||
| types of the source and destination arguments. | ||
|
|
||
| For information on the layouts of the input and output matrces (which are stored | ||
| For information on the layouts of the input and output matrices (which are stored | ||
| in `sourceA`, `sourceB`, `destC`, and `destD`), see the CDNA ISA documentation. | ||
|
|
||
| The `cbsz`, `abid`, and `blgp` parameters control how the lanes of the wave | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.