Skip to content

Commit 79a8a3b

Browse files
[PIPELINE] Always follow the order of blocked layout when creating shared layout for cp.async (#5905)
This fixes the issue triton-lang/triton#5882 The problem was that even though the load seemed to result in something we can vectorize, changing the order of shared mem prevented use of longer cp.async.
1 parent 489f6a0 commit 79a8a3b

File tree

1 file changed

+1
-12
lines changed

1 file changed

+1
-12
lines changed

lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -372,18 +372,7 @@ static std::optional<ttg::SharedEncodingTrait>
372372
getSharedEncoding(Operation *loadOp, bool isTMALoad) {
373373
auto ty = cast<RankedTensorType>(loadOp->getResultTypes()[0]);
374374
auto ctaLayout = ttg::getCTALayout(ty.getEncoding());
375-
auto blockedOrder = ttg::getOrder(ty.getEncoding());
376-
SmallVector<unsigned> order;
377-
if (blockedOrder.size() == 3) {
378-
for (unsigned i = 0; i < blockedOrder.size(); ++i) {
379-
if (blockedOrder[i] == 0)
380-
continue;
381-
order.push_back(blockedOrder[i]);
382-
}
383-
order.push_back(0);
384-
} else {
385-
order = blockedOrder;
386-
}
375+
auto order = ttg::getOrder(ty.getEncoding());
387376

388377
ttg::SharedEncodingTrait localAllocEnc;
389378
if (llvm::any_of(loadOp->getUsers(), [&](Operation *user) {

0 commit comments

Comments
 (0)