You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BACKEND] Fix codegen for ScanOp when there are redundant threads (triton-lang#5641)
This was a mildly tricky bug to track down. Groups of threads with
redundant data weren't being masked out, causing them to shuffle data in
from threads they weren't supposed to and accumulate them. E.g. if there
are 32 threads where the first 16 have unique data and the second half
are replicas, lane 16 will shuffle in data from lane 15, 14, 12, etc.
and add them in.
If the result of the scan is used in such a way that the redundant data
is simply discarded, such as stored to global memory, then the invalid
values simply aren't observed, but the case that exposed this was a
broadcast of the result, causing the invalid results to be observed.
0 commit comments