Add GatherOp support #354

ziliangzl · 2025-09-12T10:13:58Z

Added support for triton::GatherOp conversion. Using cf.assert for out-of-bound indices, consistent with the CUDA backend behavior (which triggers a device assert).

test/Conversion/TritonToLinalg/gather.mlir

ziliangzl · 2025-09-13T05:57:39Z

@microsoft-github-policy-service agree

enjustli · 2025-09-19T07:17:01Z

why not convert triton::GatherOp to tts::GatherOp. 🤔 tts::GatherOp will be converted to affine::AffineLoadop in UnstructuredToMemrefPass.

bmyerz0 · 2025-09-19T16:01:25Z

why not convert triton::GatherOp to tts::GatherOp. 🤔 tts::GatherOp will be converted to affine::AffineLoadop in UnstructuredToMemrefPass.

While the two gather op are both gathers, my understanding is that they operate at different levels (@python3kgae can correct me). tts::GatherOp is it is from tl.load/store, an operation on pointers. Whereas triton::GatherOp is from tl.gather, an operation on tensor block. In our lowering to memref/linalg, we generally associate load/store with memref and other tensor block ops with tensor/linalg. So I think tl.gather when lowered to linalg (possibly through a new tts dialect op if it is necessary) should operate on tensor dialect not memref. I admit it is confusing naming, so it could make sense to re-name tts::GatherOp in the process.

ziliangzl · 2025-09-29T05:27:22Z

Hi @bmyerz0, just wanted to check if my understanding is correct: the current lowering of triton::GatherOp to linalg is fine, and the only concern is the naming of tts::GatherOp to avoid confusion. Please let me know if there’s anything else I should address. Thanks!

bmyerz0 · 2025-10-03T18:06:25Z

Hi @bmyerz0, just wanted to check if my understanding is correct: the current lowering of triton::GatherOp to linalg is fine, and the only concern is the naming of tts::GatherOp to avoid confusion. Please let me know if there’s anything else I should address. Thanks!

Yes, I do not think it is good to lower triton::GatherOp to tts::GatherOp. The suggestion is to lower triton::GatherOp directly to linalg.

enjustli · 2025-10-06T19:40:02Z

Hi @bmyerz0, just wanted to check if my understanding is correct: the current lowering of triton::GatherOp to linalg is fine, and the only concern is the naming of tts::GatherOp to avoid confusion. Please let me know if there’s anything else I should address. Thanks!

Yes, I do not think it is good to lower triton::GatherOp to tts::GatherOp. The suggestion is to lower triton::GatherOp directly to linalg.

But in some npu hardware, 'gather' op is needed. I thought trition-shared should provide a middle-layer dialect to reserve this semantic. If we directly convert 'trition gather' to 'linalg.generic', we will lost it. Could we consider 'tts gather'?

bmyerz0 · 2025-10-06T20:41:19Z

Yes, for the sake of backends flexibility, I think it is reasonable to make any tts op lowering to linalg.generic optional. But we should still provide the lowering. And I think that tts should include two different ops, one from tl.load vs tl.gather.

ziliangzl · 2025-10-15T08:05:33Z

Maybe we should use tts.gather for tl.load and introduce a new ttx.gather for tl.gather?

ziliangzl · 2025-10-15T08:07:11Z

The current PR implements a fully functional lowering for triton::GatherOp on the CPU backend. Would it be possible to merge this PR first? Support ttx.gather op for npu hardware can be addressed separately, if needed.

bmyerz0 · 2025-10-21T23:44:26Z

python/examples/test_gather.py

+    ([32], [64], 0),
+    ([4, 4], [8, 4], 0),
+    ([128, 64], [256, 64], 0),
+    ([128, 64], [128, 128], 1),


these all appear increase the size of the tensor. Can you have test cases that contract the size of the tensor?

bmyerz0 · 2025-10-21T23:49:56Z

The current PR implements a fully functional lowering for triton::GatherOp on the CPU backend. Would it be possible to merge this PR first? Support ttx.gather op for npu hardware can be addressed separately, if needed.

I think the direct triton::GatherOp to linalg that you have is a good solution. It mirrors what we do for other ops on tensors.

Add tl.gather support

df485d7

bmyerz0 reviewed Sep 12, 2025

View reviewed changes

test/Conversion/TritonToLinalg/gather.mlir Show resolved Hide resolved

bmyerz0 reviewed Sep 12, 2025

View reviewed changes

test/Conversion/TritonToLinalg/gather.mlir Show resolved Hide resolved

Update tests

1384e0a

.

1107149

ziliangzl requested a review from bmyerz0 October 21, 2025 02:04

bmyerz0 requested changes Oct 21, 2025

View reviewed changes

Add GatherOp support #354

Are you sure you want to change the base?

Add GatherOp support #354

Uh oh!

Conversation

ziliangzl commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

ziliangzl commented Sep 13, 2025

Uh oh!

enjustli commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmyerz0 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ziliangzl commented Sep 29, 2025

Uh oh!

bmyerz0 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enjustli commented Oct 6, 2025

Uh oh!

bmyerz0 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ziliangzl commented Oct 15, 2025

Uh oh!

ziliangzl commented Oct 15, 2025

Uh oh!

bmyerz0 Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

bmyerz0 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enjustli commented Sep 19, 2025 •

edited

Loading

bmyerz0 commented Sep 19, 2025 •

edited

Loading

bmyerz0 commented Oct 3, 2025 •

edited

Loading

bmyerz0 commented Oct 6, 2025 •

edited

Loading