Skip to content

Commit b2b3399

Browse files
committed
Add scaled WMMA to AMDGPU
1 parent f803e46 commit b2b3399

File tree

5 files changed

+495
-30
lines changed

5 files changed

+495
-30
lines changed

mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -962,6 +962,15 @@ def MFMAOutTypes : AnyTypeOf<[F64,
962962
def ScaledMFMAInTypes : AnyTypeOf<[VectorOfLengthAndType<[32], [F8E5M2, F8E4M3FN]>,
963963
VectorOfLengthAndType<[32], [F6E2M3FN, F6E3M2FN, F4E2M1FN]>]>;
964964
def ScaledMFMAOutTypes : AnyTypeOf<[VectorOfLengthAndType<[4, 16], [F32]>]>;
965+
966+
// scaled_wmma
967+
def ScaledWMMAInTypes
968+
: AnyTypeOf<[VectorOfLengthAndType<[64], [F8E5M2, F8E4M3FN]>,
969+
VectorOfLengthAndType<[64], [F6E2M3FN, F6E3M2FN]>,
970+
VectorOfLengthAndType<[64, 128], [F4E2M1FN]>]>;
971+
972+
def ScaledWMMAOutTypes : AnyTypeOf<[VectorOfLengthAndType<[8, 16], [F32]>]>;
973+
965974
// wmma
966975
def WMMAInTypes : AnyTypeOf<[VectorOfLengthAndType<[2], [F32]>,
967976
VectorOfLengthAndType<[4, 8, 16], [F16, BF16]>,
@@ -1229,6 +1238,90 @@ def AMDGPU_ScaledMFMAOp :
12291238
let hasCanonicalizer = 1;
12301239
}
12311240

1241+
def AMDGPU_ScaledWMMAOp
1242+
: AMDGPU_Op<"scaled_wmma", [AllTypesMatch<["destC", "destD"]>, Pure]>,
1243+
Arguments<(ins ConfinedAttr<I32Attr, [IntIsOneOf<[16, 32]>]>:$m,
1244+
ConfinedAttr<I32Attr, [IntIsOneOf<[16]>]>:$n,
1245+
ConfinedAttr<I32Attr, [IntIsOneOf<[128]>]>:$k,
1246+
ScaledWMMAInTypes:$sourceA, ScaledWMMAInTypes:$sourceB,
1247+
ScaledWMMAOutTypes:$destC,
1248+
VectorOfLengthAndType<[4, 8], [F8E8M0FNU, F8E4M3FN]>:$scaleA,
1249+
ConfinedAttr<I32Attr, [IntIsOneOf<[0, 16]>]>:$a_first_scale_lane,
1250+
VectorOfLengthAndType<[4, 8], [F8E8M0FNU, F8E4M3FN]>:$scaleB,
1251+
ConfinedAttr<I32Attr, [IntIsOneOf<[0, 16]>]>:$b_first_scale_lane)>,
1252+
Results<(outs ScaledWMMAOutTypes:$destD)> {
1253+
// TODO: E5M3FNU scales are supported, but there is not yet MLIR support for
1254+
// this datatype. Once we have support for that, update the scaleA and scaleB
1255+
// types here.
1256+
let summary = "MLIR wrapper for scaled wmma instructions";
1257+
let description = [{
1258+
The `amdgpu.scaled_wmma` op is an MLIR wrapper around intrinsics for scaled
1259+
`wmma` instructions. These instructions perform matrix multiplication with
1260+
per-block scaling of inputs, supporting fp4, fp6, and fp8 data formats.
1261+
1262+
The scale instructions support a block size of 16 or 32 and two tile sizes:
1263+
- 16x16x128 with mixed f8/f6/f4 formats (output: vector<8xf32>)
1264+
- 32x16x128 with f4 format only (output: vector<16xf32>)
1265+
1266+
Scale parameters (`scaleA`, `scaleB`) are small vectors of f8 scale values
1267+
(either f8E8M0FNU, or f8E4M3FN) that are packed into i32/i64 values during
1268+
lowering. Each lane can operate on 4 bytes (4 scale values), and the
1269+
number of scales required for each matrix is determined by:
1270+
num_scales_A = (M × K) / block_size
1271+
num_scales_B = (N × K) / block_size
1272+
1273+
The index attributes (`a_first_scale_lane`, `b_first_scale_lane`) select
1274+
which lane to start reading scale values from (0 or 16):
1275+
- For block size 32, 32 lanes across a single wave are used for the scale
1276+
values. If the number of scales (num_scales_A or num_scales_B) can fit
1277+
into half of the available lanes
1278+
(i.e., num_scales / scales_per_lane == 16 (num_lanes)),
1279+
then then first_scale_lane can be either 0 or 16. If all lanes are required
1280+
for storing the scale values (num_scales / scales_per_lane == 32 (num_lanes)),
1281+
then the first_scale_lane must be 0.
1282+
- For block size 16, the same rules apply as above except that there are 64
1283+
lanes across two waves that are used for the scale values. When
1284+
num_scales / scales_per_lane == 32 (num lanes), then 16 lanes from each wave are used.
1285+
first_scale_lane of 0 or 16 will decide which lanes are used for this. When
1286+
num_scales / scales_per_lane == 64 (num_lanes), then first_scale_lane must
1287+
be set to 0.
1288+
1289+
For tile size 16x16x128, each matrix gets 64 scales stored
1290+
16 lanes, with `a_first_scale_lane`/`b_first_scale_lane` selecting lanes
1291+
0-15 (index=0) or lanes 16-31 (index=16). For a tile size of 32x16x128,
1292+
matrix A gets 128 scales in a full VGPR (`a_first_scale_lane` is unused),
1293+
while matrix B gets 64 scales in half a VGPR.
1294+
- Block size 16: For a tile size of 16x16x128, each matrix gets
1295+
128 scales stored in half of two VGPRs, with `a_first_scale_lane`/`b_first_scale_lane`
1296+
selecting lanes 0-15 (index=0) or 16-31 (index=1) for each of the VGPRs.
1297+
For 32x16x128, matrix A gets 256 scales in two VGPRs (`a_first_scale_lane` is unused),
1298+
while matrix B gets 128 scales stored in half of two VGPRs.
1299+
1300+
Example:
1301+
```mlir
1302+
// 16x16x128: fp8 inputs
1303+
%0 = amdgpu.scaled_wmma 16x16x128 (%scaleVecA * %matA) * (%scaleVecB * %matB) + %matC
1304+
{a_first_scale_lane = 0 : i32, b_first_scale_lane = 0 : i32}
1305+
: vector<4xf8E8M0FNU>, vector<64xf8E4M3FN>,
1306+
vector<4xf8E8M0FNU>, vector<64xf8E4M3FN>, vector<8xf32>
1307+
1308+
// 32x16x128: fp4 inputs with different scale lanes
1309+
%1 = amdgpu.scaled_wmma 32x16x128 (%scaleVecD * %matD) * (%scaleVecE * %matE) + %matF
1310+
{a_first_scale_lane = 0 : i32, b_first_scale_lane = 16 : i32}
1311+
: vector<8xf8E4M3FN>, vector<128xf4E2M1FN>,
1312+
vector<8xf8E4M3FN>, vector<64xf4E2M1FN>, vector<16xf32>
1313+
```
1314+
}];
1315+
let assemblyFormat = [{
1316+
custom<MNKDimensionList>($m, $n, $k) ` `
1317+
`(` $scaleA `*` $sourceA `)` `*`
1318+
`(` $scaleB `*` $sourceB `)` `+` $destC
1319+
attr-dict
1320+
`:` type($scaleA) `,` type($sourceA) `,` type($scaleB) `,` type($sourceB) `,` type($destC)
1321+
}];
1322+
let hasVerifier = 1;
1323+
}
1324+
12321325
def AMDGPU_MakeDmaBaseOp :
12331326
AMDGPU_Op<"make_dma_base", [Pure, AttrSizedOperandSegments, AllElementTypesMatch<["global", "lds"]>]>,
12341327
Arguments<(ins Arg<AnyMemRef>:$global,

mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp

Lines changed: 172 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -619,8 +619,8 @@ struct SchedBarrierOpLowering : public ConvertOpToLLVMPattern<SchedBarrierOp> {
619619

620620
} // namespace
621621

622-
/// Converts a MFMA vector operand from MLIR AMDGPU dialect convention to ROCDL
623-
/// and LLVM AMDGPU intrinsics convention.
622+
/// Pack small float vector operands (fp4/fp6/fp8/bf16) into the format
623+
/// expected by scaled matrix multiply intrinsics (MFMA/WMMA).
624624
///
625625
/// Specifically:
626626
/// 1. If the element type is bfloat16, bitcast it to i16 unless rocdl intrinsic
@@ -634,9 +634,9 @@ struct SchedBarrierOpLowering : public ConvertOpToLLVMPattern<SchedBarrierOp> {
634634
/// Note that the type of `input` has already been LLVM type converted:
635635
/// therefore 8-bit and smaller floats are represented as their corresponding
636636
/// `iN` integers.
637-
static Value convertMFMAVectorOperand(ConversionPatternRewriter &rewriter,
638-
Location loc, Value input,
639-
bool allowBf16 = true) {
637+
static Value packSmallFloatVectorOperand(ConversionPatternRewriter &rewriter,
638+
Location loc, Value input,
639+
bool allowBf16 = true) {
640640
Type inputType = input.getType();
641641
if (auto vectorType = dyn_cast<VectorType>(inputType)) {
642642
if (vectorType.getElementType().isBF16() && !allowBf16)
@@ -660,23 +660,60 @@ static Value convertMFMAVectorOperand(ConversionPatternRewriter &rewriter,
660660
return input;
661661
}
662662

663-
/// Converts the scaled MFMA operands, `scalesA` and `scalesB`, from MLIR AMDGPU
664-
/// dialect convention to ROCDL and LLVM AMDGPU intrinsics convention.
663+
/// Converts the scaled MFMA/WMMA operands, `scalesA` and `scalesB`, from MLIR
664+
/// AMDGPU dialect convention to ROCDL and LLVM AMDGPU intrinsics convention.
665665
///
666666
/// Specifically:
667667
/// 1. If `input` is a i8 value, zero extend it to i32
668-
/// 2. If `input` is a vector of length 4 and type i8, cast it to i32
668+
/// 2. If `input` is a vector of length 4 or 8 and type i8, cast it to i32
669669
///
670670
/// Note that the type of `input` has already been LLVM type converted:
671671
/// therefore 8-bit and smaller floats are represented as their corresponding
672672
/// `iN` integers.
673-
static Value castMFMAScaleOperand(ConversionPatternRewriter &rewriter,
674-
Location loc, Value input) {
675-
Type inputType = input.getType();
676-
Type outputType = rewriter.getI32Type();
677-
if (auto intType = dyn_cast<IntegerType>(inputType))
678-
return LLVM::ZExtOp::create(rewriter, loc, outputType, input);
679-
return LLVM::BitcastOp::create(rewriter, loc, outputType, input);
673+
static Value castScaleOperand(ConversionPatternRewriter &rewriter, Location loc,
674+
Value input) {
675+
return TypeSwitch<Type, Value>(input.getType())
676+
.Case<IntegerType>([&](IntegerType) {
677+
// Handle scalar i8: zero extend to i32.
678+
return LLVM::ZExtOp::create(rewriter, loc, rewriter.getI32Type(),
679+
input);
680+
})
681+
.Case<VectorType>([&](VectorType vectorType) {
682+
// Handle vector<4xi8> -> i32 or vector<8xi8> -> i64.
683+
int64_t numElements = vectorType.getNumElements();
684+
assert((numElements == 4 || numElements == 8) &&
685+
"scale operand must be a vector of length 4 or 8");
686+
IntegerType outputType =
687+
(numElements == 4) ? rewriter.getI32Type() : rewriter.getI64Type();
688+
return LLVM::BitcastOp::create(rewriter, loc, outputType, input);
689+
})
690+
.Default([](Type) -> Value {
691+
llvm_unreachable("unexpected input type for scale operand");
692+
});
693+
}
694+
695+
/// Maps f8 scale element types to WMMA scale format codes.
696+
static std::optional<uint32_t> getWmmaScaleFormat(Type elemType) {
697+
return TypeSwitch<Type, std::optional<uint32_t>>(elemType)
698+
.Case([](Float8E8M0FNUType) { return 0; })
699+
.Case([](Float8E4M3FNType) { return 2; })
700+
.Default(std::nullopt);
701+
}
702+
703+
/// Determines the ROCDL intrinsic name for scaled WMMA based on dimensions
704+
/// and scale block size (16 or 32).
705+
static std::optional<StringRef>
706+
getScaledWmmaIntrinsicName(int64_t m, int64_t n, int64_t k, bool isScale16) {
707+
if (m == 16 && n == 16 && k == 128)
708+
return isScale16
709+
? ROCDL::wmma_scale16_f32_16x16x128_f8f6f4::getOperationName()
710+
: ROCDL::wmma_scale_f32_16x16x128_f8f6f4::getOperationName();
711+
712+
if (m == 32 && n == 16 && k == 128)
713+
return isScale16 ? ROCDL::wmma_scale16_f32_32x16x128_f4::getOperationName()
714+
: ROCDL::wmma_scale_f32_32x16x128_f4::getOperationName();
715+
716+
return std::nullopt;
680717
}
681718

682719
/// Push an input operand. If it is a float type, nothing to do. If it is
@@ -925,7 +962,7 @@ static std::optional<StringRef> mfmaOpToIntrinsic(MFMAOp mfma,
925962
return std::nullopt;
926963
}
927964

928-
static std::optional<uint32_t> mfmaTypeSelectCode(Type mlirElemType) {
965+
static std::optional<uint32_t> smallFloatTypeToFormatCode(Type mlirElemType) {
929966
return llvm::TypeSwitch<Type, std::optional<uint32_t>>(mlirElemType)
930967
.Case([](Float8E4M3FNType) { return 0u; })
931968
.Case([](Float8E5M2Type) { return 1u; })
@@ -954,8 +991,8 @@ mfmaOpToScaledIntrinsic(Type aType, Type bType, Type destType, uint32_t m,
954991
if (!isa<Float32Type>(destType))
955992
return std::nullopt;
956993

957-
std::optional<uint32_t> aTypeCode = mfmaTypeSelectCode(aType);
958-
std::optional<uint32_t> bTypeCode = mfmaTypeSelectCode(bType);
994+
std::optional<uint32_t> aTypeCode = smallFloatTypeToFormatCode(aType);
995+
std::optional<uint32_t> bTypeCode = smallFloatTypeToFormatCode(bType);
959996
if (!aTypeCode || !bTypeCode)
960997
return std::nullopt;
961998

@@ -1219,9 +1256,9 @@ struct MFMAOpLowering : public ConvertOpToLLVMPattern<MFMAOp> {
12191256
}();
12201257
OperationState loweredOp(loc, intrinsicName);
12211258
loweredOp.addTypes(intrinsicOutType);
1222-
loweredOp.addOperands({convertMFMAVectorOperand(
1259+
loweredOp.addOperands({packSmallFloatVectorOperand(
12231260
rewriter, loc, adaptor.getSourceA(), allowBf16),
1224-
convertMFMAVectorOperand(
1261+
packSmallFloatVectorOperand(
12251262
rewriter, loc, adaptor.getSourceB(), allowBf16),
12261263
adaptor.getDestC()});
12271264
if (isScaled) {
@@ -1268,8 +1305,8 @@ struct ScaledMFMAOpLowering : public ConvertOpToLLVMPattern<ScaledMFMAOp> {
12681305
OperationState loweredOp(loc, intrinsicName);
12691306
loweredOp.addTypes(intrinsicOutType);
12701307
loweredOp.addOperands(
1271-
{convertMFMAVectorOperand(rewriter, loc, adaptor.getSourceA()),
1272-
convertMFMAVectorOperand(rewriter, loc, adaptor.getSourceB()),
1308+
{packSmallFloatVectorOperand(rewriter, loc, adaptor.getSourceA()),
1309+
packSmallFloatVectorOperand(rewriter, loc, adaptor.getSourceB()),
12731310
adaptor.getDestC()});
12741311
Value scalesIdxA =
12751312
createI32Constant(rewriter, loc, adaptor.getScalesIdxA());
@@ -1280,10 +1317,10 @@ struct ScaledMFMAOpLowering : public ConvertOpToLLVMPattern<ScaledMFMAOp> {
12801317
createI32Constant(rewriter, loc, bTypeCode),
12811318
/*scales idx A=*/scalesIdxA,
12821319
/*scales A*/
1283-
castMFMAScaleOperand(rewriter, loc, adaptor.getScalesA()),
1320+
castScaleOperand(rewriter, loc, adaptor.getScalesA()),
12841321
/*scales idx B=*/scalesIdxB,
12851322
/*scales B*/
1286-
castMFMAScaleOperand(rewriter, loc, adaptor.getScalesB())});
1323+
castScaleOperand(rewriter, loc, adaptor.getScalesB())});
12871324
Value lowered = rewriter.create(loweredOp)->getResult(0);
12881325
rewriter.replaceOp(op, lowered);
12891326
return success();
@@ -1370,6 +1407,111 @@ struct WMMAOpLowering : public ConvertOpToLLVMPattern<WMMAOp> {
13701407
}
13711408
};
13721409

1410+
struct ScaledWMMAOpLowering : public ConvertOpToLLVMPattern<ScaledWMMAOp> {
1411+
ScaledWMMAOpLowering(const LLVMTypeConverter &converter, Chipset chipset)
1412+
: ConvertOpToLLVMPattern<ScaledWMMAOp>(converter), chipset(chipset) {}
1413+
1414+
Chipset chipset;
1415+
1416+
LogicalResult
1417+
matchAndRewrite(ScaledWMMAOp op, ScaledWMMAOpAdaptor adaptor,
1418+
ConversionPatternRewriter &rewriter) const override {
1419+
Location loc = op.getLoc();
1420+
auto outType =
1421+
typeConverter->convertType<VectorType>(op.getDestD().getType());
1422+
if (!outType)
1423+
return rewriter.notifyMatchFailure(op, "type conversion failed");
1424+
1425+
if (chipset < kGfx1250)
1426+
return op->emitOpError("WMMA scale only supported on gfx1250+");
1427+
1428+
int64_t m = op.getM();
1429+
int64_t n = op.getN();
1430+
int64_t k = op.getK();
1431+
1432+
Type aElemType = getElementTypeOrSelf(op.getSourceA().getType());
1433+
Type bElemType = getElementTypeOrSelf(op.getSourceB().getType());
1434+
1435+
std::optional<uint32_t> aFmtCode = smallFloatTypeToFormatCode(aElemType);
1436+
std::optional<uint32_t> bFmtCode = smallFloatTypeToFormatCode(bElemType);
1437+
1438+
if (!aFmtCode || !bFmtCode)
1439+
return op.emitOpError("unsupported element types for scaled_wmma");
1440+
1441+
// Get scale vector types and determine variant (scale vs scale16).
1442+
auto scaleAVecType = cast<VectorType>(op.getScaleA().getType());
1443+
auto scaleBVecType = cast<VectorType>(op.getScaleB().getType());
1444+
1445+
if (scaleAVecType.getNumElements() != scaleBVecType.getNumElements())
1446+
return op.emitOpError("scaleA and scaleB must have equal vector length");
1447+
1448+
// Extract scale format from element types.
1449+
Type scaleAElemType = scaleAVecType.getElementType();
1450+
Type scaleBElemType = scaleBVecType.getElementType();
1451+
1452+
std::optional<uint32_t> scaleAFmt = getWmmaScaleFormat(scaleAElemType);
1453+
std::optional<uint32_t> scaleBFmt = getWmmaScaleFormat(scaleBElemType);
1454+
1455+
if (!scaleAFmt || !scaleBFmt)
1456+
return op.emitOpError("unsupported scale element types");
1457+
1458+
// Determine which intrinsic to use based on dimensions.
1459+
bool isScale16 = (scaleAVecType.getNumElements() == 8);
1460+
std::optional<StringRef> intrinsicName =
1461+
getScaledWmmaIntrinsicName(m, n, k, isScale16);
1462+
if (!intrinsicName)
1463+
return op.emitOpError("unsupported scaled_wmma dimensions: ")
1464+
<< m << "x" << n << "x" << k;
1465+
1466+
SmallVector<NamedAttribute, 8> attrs;
1467+
1468+
// The f4 variant does not have fmtA and fmtB attributes.
1469+
bool is32x16 = (m == 32 && n == 16 && k == 128);
1470+
if (!is32x16) {
1471+
attrs.emplace_back("fmtA", rewriter.getI32IntegerAttr(*aFmtCode));
1472+
attrs.emplace_back("fmtB", rewriter.getI32IntegerAttr(*bFmtCode));
1473+
}
1474+
1475+
// modC uses default value of 0.
1476+
attrs.emplace_back("modC", rewriter.getI16IntegerAttr(0));
1477+
1478+
// Scale attributes. Convert user-facing firstScaleLane (0 or 16) to the
1479+
// half of the wave that is being selected (0 or 1).
1480+
attrs.emplace_back(
1481+
"scaleAType", rewriter.getI32IntegerAttr(op.getAFirstScaleLane() / 16));
1482+
attrs.emplace_back("fmtScaleA", rewriter.getI32IntegerAttr(*scaleAFmt));
1483+
attrs.emplace_back(
1484+
"scaleBType", rewriter.getI32IntegerAttr(op.getBFirstScaleLane() / 16));
1485+
attrs.emplace_back("fmtScaleB", rewriter.getI32IntegerAttr(*scaleBFmt));
1486+
1487+
// Reuse flags use default value of false.
1488+
attrs.emplace_back("reuseA", rewriter.getBoolAttr(false));
1489+
attrs.emplace_back("reuseB", rewriter.getBoolAttr(false));
1490+
1491+
// Convert typed float vectors to packed format.
1492+
Value sourceA =
1493+
packSmallFloatVectorOperand(rewriter, loc, adaptor.getSourceA());
1494+
Value sourceB =
1495+
packSmallFloatVectorOperand(rewriter, loc, adaptor.getSourceB());
1496+
1497+
// Pack scale vectors into i32/i64.
1498+
Value packedScaleA = castScaleOperand(rewriter, loc, adaptor.getScaleA());
1499+
Value packedScaleB = castScaleOperand(rewriter, loc, adaptor.getScaleB());
1500+
1501+
// Create the intrinsic call.
1502+
OperationState loweredOp(loc, *intrinsicName);
1503+
loweredOp.addTypes(outType);
1504+
loweredOp.addOperands(
1505+
{sourceA, sourceB, adaptor.getDestC(), packedScaleA, packedScaleB});
1506+
loweredOp.addAttributes(attrs);
1507+
1508+
Operation *lowered = rewriter.create(loweredOp);
1509+
rewriter.replaceOp(op, lowered->getResults());
1510+
1511+
return success();
1512+
}
1513+
};
1514+
13731515
struct TransposeLoadOpLowering
13741516
: public ConvertOpToLLVMPattern<TransposeLoadOp> {
13751517
TransposeLoadOpLowering(const LLVMTypeConverter &converter, Chipset chipset)
@@ -2780,11 +2922,11 @@ void mlir::populateAMDGPUToROCDLConversionPatterns(LLVMTypeConverter &converter,
27802922
ROCDL::RawPtrBufferAtomicCmpSwap>,
27812923
AMDGPUDPPLowering, MemoryCounterWaitOpLowering, LDSBarrierOpLowering,
27822924
SchedBarrierOpLowering, MFMAOpLowering, ScaledMFMAOpLowering,
2783-
WMMAOpLowering, ExtPackedFp8OpLowering, ScaledExtPackedMatrixOpLowering,
2784-
ScaledExtPackedOpLowering, PackedScaledTruncOpLowering,
2785-
PackedTrunc2xFp8OpLowering, PackedStochRoundFp8OpLowering,
2786-
GatherToLDSOpLowering, TransposeLoadOpLowering, AMDGPUPermlaneLowering,
2787-
AMDGPUMakeDmaBaseLowering, AMDGPUMakeDmaDescriptorLowering>(converter,
2788-
chipset);
2925+
WMMAOpLowering, ScaledWMMAOpLowering, ExtPackedFp8OpLowering,
2926+
ScaledExtPackedMatrixOpLowering, ScaledExtPackedOpLowering,
2927+
PackedScaledTruncOpLowering, PackedTrunc2xFp8OpLowering,
2928+
PackedStochRoundFp8OpLowering, GatherToLDSOpLowering,
2929+
TransposeLoadOpLowering, AMDGPUPermlaneLowering,AMDGPUMakeDmaBaseLowering,
2930+
AMDGPUMakeDmaDescriptorLowering>(converter, chipset);
27892931
patterns.add<AMDGPUSwizzleBitModeLowering>(converter);
27902932
}

0 commit comments

Comments
 (0)