You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TMA] Fix lowering TMA load when 2 users of differing encodings (#7398)
Recently a user reported a crash during TMA lowering in a kernel which
roughly looks like
```
y = Y_desc.load([offset, 0])
for d_offset in tl.range(0, D, BLOCK_D):
x = X_desc.load([offset, d_offset])
xt = tl.trans(x)
acc = tl.dot(xt, y)
out += tl.dot(x, tl.dot(xt, y).to(dtype))
```
The error shows up as
```
error: operand #0 does not dominate this use
xt = tl.trans(x)
```
Here's a minimized version of the faulty ttgir:
```
%36 = "ttg.local_alloc"()
...
%39 = "ttg.local_alloc"(%48)
%40 = "ttg.memdesc_trans"(%39) <{order = array<i32: 1, 0>}>
...
%48 = "ttg.local_load"(%36)
"scf.yield"(%47#0) : (tensor<64x64xf32, #mma1>) -> ()
```
---
In `replaceUsesAndPropagateType` there're 2 places where the insertion
point is changed, but only one of them is scoped by an insertion guard.
The fix in this PR is to scope the other one, too. I've included a lit
test to validate the fixed behavior, too.
0 commit comments