Skip to content

Commit b285609

Browse files
[AMD] Always pipeline small loads on RDNA (triton-lang#8063)
On RDNA, we always pipeline through registers and can only check completion of loads in the order they were dispatched through s_wait_loadcnt. If we have small loads that are not pipelined, this can force a wait on pipelined loads as well, negating the benefits of pipelining. Co-authored-by: Paul Trojahn <[email protected]>
1 parent 5c5ab9f commit b285609

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

third_party/amd/lib/TritonAMDGPUTransforms/StreamPipeline.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,8 @@ preprocessLoop(triton::AMD::ModuleAxisInfoAnalysis &axisInfoAnalysis,
555555
isaFamily = triton::AMD::deduceISAFamily(*arch);
556556

557557
bool pipelineWithoutDot = forOp->hasAttr(mlir::triton::kNumStagesAttrName);
558-
bool filterSmallVectors = isaFamily != triton::AMD::ISAFamily::CDNA4;
558+
bool filterSmallVectors =
559+
isaFamily != triton::AMD::ISAFamily::CDNA4 && !isRDNA(isaFamily);
559560
llvm::MapVector<Operation *, std::pair<int, Operation *>> loadOpToIndLevel =
560561
triton::gpu::loadOpsToIndirectionLevel(forOp, pipelineWithoutDot,
561562
axisInfoAnalysis, numStages,

0 commit comments

Comments
 (0)