Skip to content

[AMD] Enable buffer ops for i64 offsets in ConvertToBufferOps#9619

Merged
antiagainst merged 5 commits intotriton-lang:mainfrom
nithinsubbiah:buffer_ops
Mar 6, 2026
Merged

[AMD] Enable buffer ops for i64 offsets in ConvertToBufferOps#9619
antiagainst merged 5 commits intotriton-lang:mainfrom
nithinsubbiah:buffer_ops

Conversation

@nithinsubbiah
Copy link
Contributor

@nithinsubbiah nithinsubbiah commented Mar 2, 2026

Previously, ConvertToBufferOps unconditionally rejected any load/store with 64-bit offsets. This prevented buffer_load/buffer_store from being used for kernels (e.g. flex attention) that use 64-bit pointer arithmetic. This patch allows 64-bit offsets through when they can be proved safe, truncates them to 32-bit, and uses the faster buffer instructions instead.

Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
@nithinsubbiah nithinsubbiah requested a review from yangshuxin March 3, 2026 00:14
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
@yangshuxin
Copy link
Contributor

The latest revision (rev3) LGTM in terms of correctness. Thanks.

@antiagainst antiagainst enabled auto-merge (squash) March 6, 2026 02:01
@nithinsubbiah
Copy link
Contributor Author

oops, thanks for checking clang-format @antiagainst

@antiagainst antiagainst merged commit 3d0e00c into triton-lang:main Mar 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants