[AMD] Optimize address increments for buffer loads in loops #8464

alefimov-amd · 2025-10-16T19:51:48Z

This PR transfers address computation from offsets in buffer loads to base pointers, which reuses amount of required computations and lowers register pressure.

This PR transfers address computation from offsets in buffer loads to base pointers, which reuces amount of required computations and lowers register pressure.

zhanglx13

In general lgtm. Just a few minor issues.

zhanglx13 · 2025-10-28T23:25:48Z

third_party/amd/lib/TritonAMDGPUTransforms/OptimizeBufferOpPtr.cpp

+
+  LogicalResult matchAndRewrite(scf::ForOp forOp,
+                                PatternRewriter &rewriter) const override {
+    LDBG("Analyzing ForOp for for offset pointer optimization: " << forOp);


zhanglx13 · 2025-10-28T23:36:20Z

third_party/amd/lib/TritonAMDGPUTransforms/OptimizeBufferOpPtr.cpp

+  // baseIncrement is a scalar which will be added to base pointer after
+  // optimization offsetInitialized is a value of offset on first loop iteration
+  // incrementOp is an operation that advances offset tensor
+  struct LoadData {


"load is a target buffer load" should also include buffer_load_to_local.
You may want to manually format multi-line comment for baseIncrement.
"offsetInitialized" should be "offsetInitializer"

It's better to name this struct "LoadInfo". Just like what we have in the pipeliner.

Current implementation covers all buffer ops, so I ended up with BufferOpInfo

zhanglx13 · 2025-10-29T07:08:39Z

third_party/amd/lib/TritonAMDGPUTransforms/OptimizeBufferOpPtr.cpp

+    }
+    Value offsetInitializer = forOp.getInitArgs()[offsetOperandNo];
+    LoadData data = {loadOp, advanceStep, Value(), offsetInitializer,
+                     incrementOp};


Why not call createScalarIncrements here to fill in baseIncrement? So after this analyzeLoad function, you have all info needed for this load.

I want to fully split analysis and transformation. I am afraid of situation when optimization creates some operations, but later heuristic decides to skip.

There are no such heuristic at the moment, but I can not be sure we will not need it.

zhanglx13 · 2025-10-29T07:16:24Z

third_party/amd/lib/TritonAMDGPUTransforms/OptimizeBufferOpPtr.cpp

+    // Gather buffer loads which could be optimized
+    SmallVector<LoadData> loads;
+    collectLoads<triton::amdgpu::BufferLoadOp>(loads, forOp);
+    collectLoads<triton::amdgpu::BufferLoadToLocalOp>(loads, forOp);


Can we create an interface for all these buffer ops, also including buffer_atomic_rmw and cas? So we don't have to go over the whole for loop again and again.

Included this change in this PR and created separate here: #8600

This PR introduces a common interface for buffer address operands.

zhanglx13

LGTM.
good to go after fixing the naming issues for the interface.

binarman added 2 commits October 28, 2025 21:52

[AMD] Optimize address increments for buffer loads in loops

f9e56e1

This PR transfers address computation from offsets in buffer loads to base pointers, which reuces amount of required computations and lowers register pressure.

move optimization to separate pass

d685304

binarman force-pushed the buffer_load_base_opt branch from 82cef71 to 18bdacd Compare October 28, 2025 22:03

support buffer_load_to_local

226f8ef

alefimov-amd force-pushed the buffer_load_base_opt branch from 18bdacd to 226f8ef Compare October 28, 2025 22:08

zhanglx13 requested changes Oct 29, 2025

View reviewed changes

binarman added 2 commits October 30, 2025 22:04

[AMD] BufferOp address interface

c5a189a

This PR introduces a common interface for buffer address operands.

address review comments

c2eacf4

alefimov-amd force-pushed the buffer_load_base_opt branch from 8fcf1d2 to c2eacf4 Compare October 30, 2025 22:04

fix compilation crash

ee86c63

zhanglx13 approved these changes Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Optimize address increments for buffer loads in loops #8464

[AMD] Optimize address increments for buffer loads in loops #8464

Uh oh!

alefimov-amd commented Oct 16, 2025 •

edited

Loading

Uh oh!

zhanglx13 left a comment

Uh oh!

zhanglx13 Oct 28, 2025

Uh oh!

zhanglx13 Oct 28, 2025

Uh oh!

alefimov-amd Oct 30, 2025

Uh oh!

zhanglx13 Oct 29, 2025

Uh oh!

alefimov-amd Oct 29, 2025

Uh oh!

zhanglx13 Oct 29, 2025

Uh oh!

alefimov-amd Oct 30, 2025

Uh oh!

zhanglx13 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AMD] Optimize address increments for buffer loads in loops #8464

Are you sure you want to change the base?

[AMD] Optimize address increments for buffer loads in loops #8464

Uh oh!

Conversation

alefimov-amd commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhanglx13 left a comment

Choose a reason for hiding this comment

Uh oh!

zhanglx13 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

zhanglx13 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

alefimov-amd Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

zhanglx13 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alefimov-amd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

zhanglx13 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

alefimov-amd Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

zhanglx13 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alefimov-amd commented Oct 16, 2025 •

edited

Loading