Skip to content

Commit 187ea27

Browse files
authored
[AMD]Enable Transposed Mfma Layout For Fp8 (#7301)
The transposed mfma layout for fp8/bf8 was disabled due to a regression. But current report shows that non-transposed mfma layout generates extra `convert_layout` ops before `split`, which consumes extra LDS. Experiment shows the regression has gone. So it's fine to enable it back.
1 parent e71689d commit 187ea27

File tree

1 file changed

+2
-7
lines changed

1 file changed

+2
-7
lines changed

third_party/amd/lib/TritonAMDGPUTransforms/AccelerateAMDMatmul.cpp

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -451,17 +451,12 @@ class BlockedToMFMA : public OpRewritePattern<tt::DotOp> {
451451
warpsPerTileMFMA(dotOp, retShape, numWarps, {mDim, nDim});
452452

453453
// Use transposed mfma layout to enable larger vectorization for global
454-
// store instructions, except for fp8 matmul kernels due to regression
455-
// TODO (lixun): investigate the regression and enable this feature again
454+
// store instructions.
456455
auto aElemTy = mfmaInstr->aElementType;
457-
bool isFP8 = llvm::isa<Float8E5M2FNUZType, Float8E4M3FNUZType,
458-
Float8E4M3FNType, Float8E5M2Type>(aElemTy);
459-
bool isTransposed =
460-
isChainDotHead(dotOp) || isChainDotTail(dotOp) || !isFP8;
461456
ttg::AMDMfmaEncodingAttr mfmaEnc = ttg::AMDMfmaEncodingAttr::get(
462457
oldRetType.getContext(),
463458
/*version*/ mfmaVersion, warpsPerTile,
464-
/*instrShape*/ mDim, nDim, isTransposed, CTALayout);
459+
/*instrShape*/ mDim, nDim, /*isTransposed=*/true, CTALayout);
465460

466461
Type mfmaAccType;
467462
if (oldRetType.getElementType().isIntOrIndex())

0 commit comments

Comments
 (0)