You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Cherry-pick][RESOLVED] Fix histograms for complex replicated layouts (#7938) (#546)
Summary:
⚠️ **MERGE CONFLICTS DETECTED** ⚠️
This cherry-pick contains merge conflicts that require manual resolution.
Original Commit: 078954b
Original Author: Saagar Jha
Original Date: 2025-08-29 05:05:37 -0700
**Action Required:**
1. Check out this branch locally
2. Resolve the merge conflicts in the affected files
3. Commit the resolved changes
4. Update this PR
Original commit message:
```
Fix histograms for complex replicated layouts (#7938)
The current histogram code assumes that replication across a warp is
done in a way that involves the first n threads having unique data. This
is not a valid assumption; in fact the function it calls to get this
layout, getThreadsPerWarp, describes one such layout and how it's
returned, so the histogram code actually discards that information. To
fix this, we actually remove the uniquing code that masks out threads
possessing duplicate data. Instead we have everyone participate and
adjust for the overcounting that results by computing the "replication
factor". This is much easier than computing the correct mask, which is
nontrivial in the general case.
```
This PR was automatically cherry-picked from the upstream triton-lang/triton repository.
The conflicts have been committed with conflict markers for easier resolution.
Pull Request resolved: #546
Reviewed By: agron911
Differential Revision: D85907975
Pulled By: dshi7
fbshipit-source-id: 218021919c1205249fe7a6783a0a186e91a56411
0 commit comments