Skip to content

[AutoWS] Support multi-buffering TMEM accumulators with leftover memory#1026

Open
njriasan wants to merge 6 commits intofacebookexperimental:mainfrom
njriasan:njriasan/multibuffer-tmem
Open

[AutoWS] Support multi-buffering TMEM accumulators with leftover memory#1026
njriasan wants to merge 6 commits intofacebookexperimental:mainfrom
njriasan:njriasan/multibuffer-tmem

Conversation

@njriasan
Copy link
Contributor

@njriasan njriasan commented Mar 3, 2026

Adds support for multi-buffering TMEM allocations based on leftover memory in the MemoryPlanner. This includes two high-level changes:

  1. In a "persistent kernel" it scans the IR and walks across the TMEM allocations. While there are extra columns it assigns the memory in a round robin fashion.
  2. It updates the async task handling to allow handling swapping between buffers each iteration of the outer loop, not just inner loop. This is only implemented for operand D.

I don't have explicit testing numbers because this was part of a broader set of changes that got our perf.

@njriasan njriasan requested a review from manman-ren March 3, 2026 01:37
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 3, 2026
// (AsyncCLCTryCancelOp) once CLC support lands in pure Triton.
bool isLikelyPersistentKernel() {
bool found = false;
operation->walk([&](tt::GetNumProgramsOp) { found = true; });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude generated this heuristic. Honestly this is fairly consistent with my kernel authoring experience.

@njriasan njriasan requested review from htyu and kvbp2k March 3, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant