Skip to content

Conversation

@erick-xanadu
Copy link
Contributor

Closes #62644

@erick-xanadu erick-xanadu changed the title Remove Pure attribute with Linalg::IndexOp. Remove Pure attribute from Linalg::IndexOp. Oct 12, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 12, 2023

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: None (erick-xanadu)

Changes

Closes #62644


Full diff: https://github.com/llvm/llvm-project/pull/68894.diff

1 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td (+1-1)
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
index da12e7c83b22b89..41d2487dd6f1479 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
@@ -46,7 +46,7 @@ def Linalg_YieldOp : Linalg_Op<"yield", [Pure, ReturnLike, Terminator]>,
   let hasVerifier = 1;
 }
 
-def Linalg_IndexOp : Linalg_Op<"index", [Pure]>,
+def Linalg_IndexOp : Linalg_Op<"index", []>,
     Arguments<(ins ConfinedAttr<I64Attr, [IntMinValue<0>]>:$dim)>,
     Results<(outs Index:$result)> {
   let summary = "linalg index operation";

@MaheshRavishankar
Copy link
Contributor

I don't think this is a good change. I looked at the issue and I think having linalg ops within other linalg ops is not currently supported

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index ops need to be pure

@erick-xanadu
Copy link
Contributor Author

@MaheshRavishankar ok, I'll close this then.

@joker-eph
Copy link
Collaborator

How can linalg.index be pure? It does not take operand but returns a different value each time right?

@joker-eph joker-eph reopened this Oct 12, 2023
@joker-eph
Copy link
Collaborator

Pinging @nicolasvasilache as well!

@joker-eph
Copy link
Collaborator

New week ping here, @MaheshRavishankar @nicolasvasilache ?

@MaheshRavishankar
Copy link
Contributor

New week ping here, @MaheshRavishankar @nicolasvasilache ?

Well, Pure is meant to allow CSE within the body of linalg.generic . It isn't expected that are linalg operations nested within linalg operations, so that is the issue here. I think marking it Pure still makes sense to me.

@joker-eph
Copy link
Collaborator

joker-eph commented Oct 17, 2023

Well, Pure is meant to allow CSE within the body of linalg.generic .

I understand this is a transformation you'd like to enable, but that's not a semantics rationale!!

It isn't expected that are linalg operations nested within linalg operations

That seems like a totally orthogonal discussion to me... (also to think about whether that composes, with inlining and other transformations)

I think marking it Pure still makes sense to me.

I don't quite get this... How can an operation that produces different values without any input/operand possibly match the definition of "Pure"?

@MaheshRavishankar
Copy link
Contributor

Well, Pure is meant to allow CSE within the body of linalg.generic .

I understand this is a transformation you'd like to enable, but that's not a semantics rationale!!

It isn't expected that are linalg operations nested within linalg operations

That seems like a totally orthogonal discussion to me... (also to think about whether that composes, with inlining and other transformations)

I think marking it Pure still makes sense to me.

I don't quite get this... How can an operation that produces different values without any input/operand possibly match the definition of "Pure"?

Well maybe some context might help. At some point there used to be linalg.indexed_generic which was similar to linalg.generic , i.e. a single block region, but in addition to the basic block arguments that represents values from each of the input and init operands, also had basic block arguments that represented the value of the induction variables of the iteration space of the linalg op. When these two ops were unified, then linalg.index was added to effectively represent the basic block arguments that were dropped from the linalg.indexed_generic.

I am not tied to it being Pure. It just should be CSE-able within a basic block.

@joker-eph
Copy link
Collaborator

Seems like a load to me, but we don't want to prove that the value is constant during the region execution.
I don't think we have anything that easily expresses this though.

All-in-all it does not change much to the problem: we shouldn't "lie" to the system about semantics just to enable generic transformations!

@MaheshRavishankar
Copy link
Contributor

Seems like a load to me, but we don't want to prove that the value is constant during the region execution. I don't think we have anything that easily expresses this though.

Why is it a load. It resolves to induction variable value when the linalg op is lowered to loops.

All-in-all it does not change much to the problem: we shouldn't "lie" to the system about semantics just to enable generic transformations!

Well, it has no side-effects. So not sure it is a lie.

@joker-eph
Copy link
Collaborator

Why is it a load. It resolves to induction variable value when the linalg op is lowered to loops.

Because I don't see another way for the generic to pass the indices index op other than via some private variable.
That is conceptually a linalg op with a 2D indexing space is stack allocating a memref<2 x index> for the current iteration indices, and the linalg.index op is a load...

Well, it has no side-effects. So not sure it is a lie.

Sorry but we're working on a compiler, you don't have a magic wand to escape reality :)

@MaheshRavishankar
Copy link
Contributor

Why is it a load. It resolves to induction variable value when the linalg op is lowered to loops.

Because I don't see another way for the generic to pass the indices index op other than via some private variable. That is conceptually a linalg op with a 2D indexing space is stack allocating a memref<2 x index> for the current iteration indices, and the linalg.index op is a load...

That seems a bit of a convoluted reasoning. Might be making things unnecessarily complicated....

Well, it has no side-effects. So not sure it is a lie.

Sorry but we're working on a compiler, you don't have a magic wand to escape reality :)

We also dont want a compiler to attach unnecessarily pessimizing semantics to something as simple as an induction variable. Thats where compilers (and presumably compiler writers) make their own life difficult.

@joker-eph
Copy link
Collaborator

joker-eph commented Oct 17, 2023

That seems a bit of a convoluted reasoning. Might be making things unnecessarily complicated....

I don't see anything simpler that would still be sound actually.

You can replace "stack alloc" by "private register file" or any conceptual abstract storage space you'd like, but I don't quite see how you can avoid the communication of the information, because it actually exists!! This is just what the ops do here...

We also dont want a compiler to attach unnecessarily pessimizing semantics to something as simple as an induction variable. Thats where compilers (and presumably compiler writers) make their own life difficult.

There is no "pessimizing" semantics here: I can't see any other simpler semantics actually...
Feel free to propose an alternative: but "magic behavior" is just not in scope!

@MaheshRavishankar
Copy link
Contributor

That seems a bit of a convoluted reasoning. Might be making things unnecessarily complicated....

I don't see anything simpler that would still be sound actually.

You can replace "stack alloc" by "private register file" or any conceptual abstract storage space you'd like, but I don't quite see how you can avoid the communication of the information, because it actually exists!! This is just what the ops do here...

We also dont want a compiler to attach unnecessarily pessimizing semantics to something as simple as an induction variable. Thats where compilers (and presumably compiler writers) make their own life difficult.

There is no "pessimizing" semantics here: I can't see any other simpler semantics actually... Feel free to propose an alternative: but "magic behavior" is just not in scope!

The only real alternative is to add basic block arguments to all linalg.generic which represent the induction variable as well, and deprecate linalg.index. Thats a big change though and disruptive. I dont really think it is justified. I understand your concern though. Maybe we just add a new side-effect mode which is meant to represent CSE-able within a basic block. (FTR I am not tied to linalg.index in any way... my only concern is that it exists and sort of works, and the original issue that triggered this path is actually the unsupported path).

@makslevental
Copy link
Contributor

At some point there used to be linalg.indexed_generic which was similar to linalg.generic , i.e. a single block region, but in addition to the basic block arguments that represents values from each of the input and init operands, also had basic block arguments that represented the value of the induction variables of the iteration space of the linalg op.

umm since this is before my time (and I'm too lazy to dig through logs...) did this linalg.indexed_generic support arbitrary index spaces? I.e., the operands were memref<k x index>? If so, why was it dropped? Why not just bring that back? I know lots (many? several?) of people that want arbitrary (non-rectangular) iteration spaces for linalg ops.

Maybe we just add a new side-effect mode which is meant to represent CSE-able within a basic block.

I vaguely remember there being some debate on discord about whether Pure is the right name to begin with (@sanjoy?). I guess we're stuck with it now (to some extent) but I would be in favor of just CSEable as being the trait name corresponding to supporting CSE.

@sanjoy
Copy link
Contributor

sanjoy commented Dec 11, 2023

@makslevental I don't think we want CSEable here, we want CSEableButOnlyWithinABlock which is pretty weird. For instance, it means we can CSE in situation A but not situation B when they are actually equivalent.

// A
block0:
  %a = linalg.index 0
  %b = linalg.index 0

// B
block0:
  %a = linalg.index 0
  br ^block1

block1:
  %b = linalg.index 0

@joker-eph 's suggestion makes sense to me -- model these operations as "reading" from some location that the lining.generic "writes" to before each iteration. Perhaps these can be modeled using a "loop index" Resource, but I'm not sure.

Sending the indices in as a basic block argument seems cleanest of all if that's a viable path.

@joker-eph
Copy link
Collaborator

I thought about the "read": but it's a read from a special resource, and one that is constant during the execution of the particular region. This is likely important to model this to be able to keep existing transformation working, but I don't think we have the expressiveness for describing this just now?

@sanjoy
Copy link
Contributor

sanjoy commented Dec 11, 2023

@joker-eph I was thinking that maybe we could model linalg.index as loading from a specific resource which does not alias the heap, similar to how TransformMappingResource and PayloadIRResource are used in TransformInterfaces.h. Not sure if this can be easily done though.

@makslevental
Copy link
Contributor

makslevental commented Dec 11, 2023

Sending the indices in as a basic block argument seems cleanest of all if that's a viable path.

So I'll reiterate/repeat my question: did linalg.indexed_generic support arbitrary iteration spaces? Because if so, we should bring it back and kill two birds with one stone.

Edit: answering my own question after searching around: no linalg.indexed_generic did not allow arbitrary iterations spaces (it just passed the indices as block args in order that lowering to library functions could make use of them).

@joker-eph
Copy link
Collaborator

@sanjoy technically that would be enough, but that wouldn't make the transformation "local": that is you'd still need to traverse the body between two uses of linalg.index to check that there is no write to this specific resource. That's why the "constant within a region instance" seems appealing: CSE could reason immediately on such ops.

@MaheshRavishankar
Copy link
Contributor

Sending the indices in as a basic block argument seems cleanest of all if that's a viable path.

So I'll reiterate/repeat my question: did linalg.indexed_generic support arbitrary iteration spaces? Because if so, we should bring it back and kill two birds with one stone.

Edit: answering my own question after searching around: no linalg.indexed_generic did not allow arbitrary iterations spaces (it just passed the indices as block args in order that lowering to library functions could make use of them).

Just answering the meta point here. It is obviously more expressive to have operations that are not orthonormal iteration spaces, but that needs to be evaluated in conjunction with complexity of transformations (like tiling and vectorization) to account for increased expressivity. (There is no free lunch). So Nicolas, I think, has been working (definitely has an extensive rfc on it) to allow reasoning about more general iteration spaces (even allowing completely irregular iteration spaces). I don't know the state of this.

@MaheshRavishankar
Copy link
Contributor

@sanjoy technically that would be enough, but that wouldn't make the transformation "local": that is you'd still need to traverse the body between two uses of linalg.index to check that there is no write to this specific resource. That's why the "constant within a region instance" seems appealing: CSE could reason immediately on such ops.

+1 to "constant within a region instance". That seems to capture exactly what is needed here.

@makslevental
Copy link
Contributor

So Nicolas, I think, has been working (definitely has an extensive rfc on it) to allow reasoning about more general iteration spaces (even allowing completely irregular iteration spaces). I don't know the state of this.

Yes I linked the rfc above. Nicolas and I were planning on filling it out earlier this year. I got started, made a little progress in iree-sandbox but unfortunately other things have taken higher priority.

@sanjoy
Copy link
Contributor

sanjoy commented Dec 11, 2023

@joker-eph @MaheshRavishankar "constant within a region instance" (as opposed to "constant within a block") makes sense to me. Ideally we'd be able to express that linalg.index is constant within the innermost linalg.generic it's a part of.

@MaheshRavishankar
Copy link
Contributor

@joker-eph @MaheshRavishankar "constant within a region instance" (as opposed to "constant within a block") makes sense to me. Ideally we'd be able to express that linalg.index is constant within the innermost linalg.generic it's a part of.

Yup. Also there shouldn't be nested linalg.generics anyway. They aren't explicitly illegal today cause it could be a way of expressing programs, but it's not the "preferred way" afaik. So there are two issues here

  1. Having a trait that says an op is constant within a region. This seems generally useful, but I don't know of any other op apart from linalg.index for which this applies.
  2. if we make nested linalg ops illegal then the above issue is potentially mute cause you would never have case where it would be illegal to cse linalg.index ops. This issue came about because someone tried to use nested linalg ops.

@erick-xanadu
Copy link
Contributor Author

we make nested linalg ops illegal then the above issue is potentially mute cause you would never have case where it would be illegal to cse linalg.index ops. This issue came about because someone tried to use nested linalg ops.

Correction: I used inlining and a function call that uses linalg operations was inlined inside a linalg op.

What is the limitation of linalg and why can't linalg be nested? This seems a bit of an arbitrary restriction but I don't have all the context, so I am happy to learn more about this.

@MaheshRavishankar
Copy link
Contributor

we make nested linalg ops illegal then the above issue is potentially mute cause you would never have case where it would be illegal to cse linalg.index ops. This issue came about because someone tried to use nested linalg ops.

Correction: I used inlining and a function call that uses linalg operations was inlined inside a linalg op.

Having an operation of that granularity is also not the preferred path.

What is the limitation of linalg and why can't linalg be nested? This seems a bit of an arbitrary restriction but I don't have all the context, so I am happy to learn more about this.

Its about how it composes with transformations. For example such a nested operation does not vectorize with linalg::vectorize, nor does lower to loops result in conversion of linalg operations to scalar code. That is not to say that isnt one way of modeling a compilation flow, but the path that is developed under the structured op based lowering does not expect nested linalg operations (or more broadly tensor operations within the body of a linalg op).

@joker-eph
Copy link
Collaborator

If we want to decide that nested linalg.generic are illegal, then we have to forbid any "call" from within the body of a linalg.generic operation. Or alternatively you need to tame the inliner to not inline a function that contains a linalg.generic op inside another one (but that does not seem right to me: inlining or not the op is still nested).

So: transformations on linalg.generic must be safe, and if they can't process nested linalg.generic they should check and bail (but it's not clear to me why they would be bothered by a nested linalg.generic and not by a call...).

@erick-xanadu
Copy link
Contributor Author

we make nested linalg ops illegal then the above issue is potentially mute cause you would never have case where it would be illegal to cse linalg.index ops. This issue came about because someone tried to use nested linalg ops

Correction: I used inlining and a function call that uses linalg operations was inlined inside a linalg op.

Having an operation of that granularity is also not the preferred path.

@MaheshRavishankar, when you are saying "having an operation of that granularity" you mean "having a call operation"? You are saying that having a call inside linalg is also not preferred?

nor does lower to loops result in conversion of linalg operations to scalar code

Can you confirm that what you are also saying here is that lowering a nested linalg operations via --convert-linalg-to-loops does not work? Or what do you concretely mean "result in conversion to scalar code"? Can you provide a concrete example?

I ask because we have been using --convert-linalg-to-loops for a long time with no problems (except disabling CSE) to avoid this error.

@MaheshRavishankar
Copy link
Contributor

we make nested linalg ops illegal then the above issue is potentially mute cause you would never have case where it would be illegal to cse linalg.index ops. This issue came about because someone tried to use nested linalg ops

Correction: I used inlining and a function call that uses linalg operations was inlined inside a linalg op.

Having an operation of that granularity is also not the preferred path.

@MaheshRavishankar, when you are saying "having an operation of that granularity" you mean "having a call operation"? You are saying that having a call inside linalg is also not preferred?

Basically I am saying having tensor operations within the body of linalg operations does not seem necessary. It is not to say it is invalid. IIUC having such an operation in the body of a linalg op means that the computation is represented as tensor of tensors. In other words this is an alternative representation of tiled computation, instead of using loops and tiled linalg operations in the body. The latter is what is generated with linalg tiling transformations. The tensor of tensors representation might have some compositionality issues as well. For example if you have a linalg.matmul with tensor of tensors types, the inner tensor type dimensions also have an implicit relationship which can be an issue. In any case, I am not saying it is wrong but such "nested" types seem unnecessary. So I would expect some sharp edges along those paths.

nor does lower to loops result in conversion of linalg operations to scalar code

Can you confirm that what you are also saying here is that lowering a nested linalg operations via --convert-linalg-to-loops does not work? Or what do you concretely mean "result in conversion to scalar code"? Can you provide a concrete example?

I ask because we have been using --convert-linalg-to-loops for a long time with no problems (except disabling CSE) to avoid this error.

Well, that's great to know! Might have been fixed since the last time I saw this cause I wouldn't have expected it to work based on how the lowering to loops used to work.

@erick-xanadu
Copy link
Contributor Author

Hi, I am pinging this since CSE was brought up again internally. It looks like there are several different things discussed here:

  1. Nesting linalg operations is not the preferred way of representing programs. And as a result neither does having a function call inside a linalg operation. It is not ideal to disallow this via the verifier because

it could be a way of expressing programs, but it's not the "preferred way" afaik

So, no change there?

  1. Having linalg.index being pure is not sound. But having CSE work on linalg.index is something that is wanted. It looks like the best solution is to model linalg.index as reading from a loop index resource that is written to by linalg.generic?

Happy to work on this, but I just would like a little bit more direction before starting to avoid working and having my work not being used at all. Thanks!

@dime10
Copy link
Contributor

dime10 commented Nov 27, 2025

Just bumping the discussion here since it's been a while. In my opinion attaching the pure attribute to a non-pure op is still posing an issue 🤔

@MaheshRavishankar
Copy link
Contributor

I don't see why linalg.index should not be pure. It has no side-effecting semantics. It should be CSEd. The issue where you shouldn't CSE this is with nested Linalg ops which AFAICS is illegal udage of Linalg ops.

If someone can summarize/maybe start a discourse post we can get convergence there instead of a PR

@joker-eph
Copy link
Collaborator

I don't see why linalg.index should not be pure. It has no side-effecting semantics.

We had this discussion a few years ago, everything is already explained above in this issue!!

In the current design of linalg.index there must be a side effect: a pure operation always return value computed from its arguments (i.e. a constant when there is no argument).

To model the transfer of information from the enclosing linalg.generic to the linalg.index operation, you need a "channel" so either:

  • There is a hidden resource used to communicated the information: linalg.generic creates/write this resource and linalg.index reads from it (hence it isn't pure).
  • There is a SSA value carrying this state (can we get back something inspired by the linalg.indexed_generic design with a block argument?), in which case linalg.index could be pure.

There is no other option within the design of MLIR today, I floated above that we could explore a different kind of model by introducing a concept of "region instance" to make the effect "local" there, but that requires some very careful design work. In the meantime: linag.index in its current form must have a side-effect.

@MaheshRavishankar
Copy link
Contributor

I don't see why linalg.index should not be pure. It has no side-effecting semantics.

We had this discussion a few years ago, everything is already explained above in this issue!!

In the current design of linalg.index there must be a side effect: a pure operation always return value computed from its arguments (i.e. a constant when there is no argument).

To model the transfer of information from the enclosing linalg.generic to the linalg.index operation, you need a "channel" so either:

  • There is a hidden resource used to communicated the information: linalg.generic creates/write this resource and linalg.index reads from it (hence it isn't pure).
  • There is a SSA value carrying this state (can we get back something inspired by the linalg.indexed_generic design with a block argument?), in which case linalg.index could be pure.

There is no other option within the design of MLIR today, I floated above that we could explore a different kind of model by introducing a concept of "region instance" to make the effect "local" there, but that requires some very careful design work. In the meantime: linag.index in its current form must have a side-effect.

I agree with all your points, but making the op not pure is not the right solution. It is a pessimizing solution cause it is adding a side-effect that does not exist (at least that is not what the ops intent is). I agree this op is ill-defined because of the reasons you mentioned. As far as I can see

  1. "region instance" sounds like the right setup. I am not sure you linked to the right comment, but what I was thinking was to say that there is an op that is valid only if it defined within a region. So it can be CSEed/moved around within the region, but cannot be "hoisted" out of the region for example. I agree this is a big change.
  2. I think one solution is to introduce a single trailing block argument to Linalg operations. The linalg.index can take this block argument as an operand to establish the region dependency. Its not the cleanest solution, but is the least disruptive cause it should only require changes to the verifier and lit tests.

@joker-eph
Copy link
Collaborator

what I was thinking was to say that there is an op that is valid only if it defined within a region. So it can be CSEed/moved around within the region, but cannot be "hoisted" out of the region for example. I agree this is a big change.

What I was describing was a change to the side effect mechanism instead to be able to model an effect that is not "escaping" from the region. Not something mechanical blocking CSE or anything (which would be a workaround for a transformation, not actually fixing the model).

I think one solution is to introduce a single trailing block argument to Linalg operations.

Seems to match my second bullet "There is a SSA value carrying this state...", so LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

linalg.index shouldn't be marked as Pure.

7 participants