Skip to content

Conversation

@Groverkss
Copy link
Member

This patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM.

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2024

@llvm/pr-subscribers-mlir

Author: Kunwar Grover (Groverkss)

Changes

This patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM.


Full diff: https://github.com/llvm/llvm-project/pull/111533.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Vector/IR/VectorOps.td (+2)
  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+12)
  • (modified) mlir/test/Transforms/loop-invariant-code-motion.mlir (+82)
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 45fd1c6e3f9384..b0de7c11b9d436 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1240,6 +1240,7 @@ def Vector_TransferReadOp :
       DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
       DeclareOpInterfaceMethods<MaskableOpInterface>,
       DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+      DeclareOpInterfaceMethods<ConditionallySpeculatable>,
       AttrSizedOperandSegments,
       DestinationStyleOpInterface
     ]>,
@@ -1487,6 +1488,7 @@ def Vector_TransferWriteOp :
       DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
       DeclareOpInterfaceMethods<MaskableOpInterface>,
       DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+      DeclareOpInterfaceMethods<ConditionallySpeculatable>,
       AttrSizedOperandSegments,
       DestinationStyleOpInterface
   ]>,
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index dc92bea09dc160..2dfb59edcf3d2f 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -4245,6 +4245,12 @@ void TransferReadOp::getEffects(
                          SideEffects::DefaultResource::get());
 }
 
+Speculation::Speculatability TransferReadOp::getSpeculatability() {
+  if (!hasPureTensorSemantics())
+    return Speculation::NotSpeculatable;
+  return Speculation::Speculatable;
+}
+
 namespace {
 /// Store to load forwarding for transfer operations with permuation maps.
 /// Even if the permutation maps are different we can still propagate the store
@@ -4627,6 +4633,12 @@ void TransferWriteOp::getEffects(
                          SideEffects::DefaultResource::get());
 }
 
+Speculation::Speculatability TransferWriteOp::getSpeculatability() {
+  if (!hasPureTensorSemantics())
+    return Speculation::NotSpeculatable;
+  return Speculation::Speculatable;
+}
+
 namespace {
 /// Remove dead transfer write from the SSA chain so that it an be eliminated by
 /// DCE
diff --git a/mlir/test/Transforms/loop-invariant-code-motion.mlir b/mlir/test/Transforms/loop-invariant-code-motion.mlir
index 57f4ece9c9f2a4..be76cb8a6156f4 100644
--- a/mlir/test/Transforms/loop-invariant-code-motion.mlir
+++ b/mlir/test/Transforms/loop-invariant-code-motion.mlir
@@ -1209,3 +1209,85 @@ func.func @hoist_linalg_ops_div_by_zero(%a : tensor<128x128xi32>,
 
   func.return %final : tensor<?x128xi32>
 }
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+                            %a : tensor<128x128xf32>, 
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %read = vector.transfer_read %a[%ida, %idb], %cst_0 : tensor<128x128xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_write
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_write
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %c0 = arith.constant 0 : index
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %empty = tensor.empty() : tensor<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %a = vector.transfer_write %cst, %empty[%c0, %c0] : vector<4x4xf32>, tensor<4x4xf32>
+    %read = vector.transfer_read %a[%c0, %c0], %cst_0 : tensor<4x4xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK-NOT: vector.transfer_read
+// CHECK: scf.for
+// CHECK: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops_memref(
+                            %a : memref<128x128xf32>, 
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %read = vector.transfer_read %a[%ida, %idb], %cst_0 : memref<128x128xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2024

@llvm/pr-subscribers-mlir-vector

Author: Kunwar Grover (Groverkss)

Changes

This patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM.


Full diff: https://github.com/llvm/llvm-project/pull/111533.diff

3 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Vector/IR/VectorOps.td (+2)
  • (modified) mlir/lib/Dialect/Vector/IR/VectorOps.cpp (+12)
  • (modified) mlir/test/Transforms/loop-invariant-code-motion.mlir (+82)
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 45fd1c6e3f9384..b0de7c11b9d436 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1240,6 +1240,7 @@ def Vector_TransferReadOp :
       DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
       DeclareOpInterfaceMethods<MaskableOpInterface>,
       DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+      DeclareOpInterfaceMethods<ConditionallySpeculatable>,
       AttrSizedOperandSegments,
       DestinationStyleOpInterface
     ]>,
@@ -1487,6 +1488,7 @@ def Vector_TransferWriteOp :
       DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
       DeclareOpInterfaceMethods<MaskableOpInterface>,
       DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+      DeclareOpInterfaceMethods<ConditionallySpeculatable>,
       AttrSizedOperandSegments,
       DestinationStyleOpInterface
   ]>,
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index dc92bea09dc160..2dfb59edcf3d2f 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -4245,6 +4245,12 @@ void TransferReadOp::getEffects(
                          SideEffects::DefaultResource::get());
 }
 
+Speculation::Speculatability TransferReadOp::getSpeculatability() {
+  if (!hasPureTensorSemantics())
+    return Speculation::NotSpeculatable;
+  return Speculation::Speculatable;
+}
+
 namespace {
 /// Store to load forwarding for transfer operations with permuation maps.
 /// Even if the permutation maps are different we can still propagate the store
@@ -4627,6 +4633,12 @@ void TransferWriteOp::getEffects(
                          SideEffects::DefaultResource::get());
 }
 
+Speculation::Speculatability TransferWriteOp::getSpeculatability() {
+  if (!hasPureTensorSemantics())
+    return Speculation::NotSpeculatable;
+  return Speculation::Speculatable;
+}
+
 namespace {
 /// Remove dead transfer write from the SSA chain so that it an be eliminated by
 /// DCE
diff --git a/mlir/test/Transforms/loop-invariant-code-motion.mlir b/mlir/test/Transforms/loop-invariant-code-motion.mlir
index 57f4ece9c9f2a4..be76cb8a6156f4 100644
--- a/mlir/test/Transforms/loop-invariant-code-motion.mlir
+++ b/mlir/test/Transforms/loop-invariant-code-motion.mlir
@@ -1209,3 +1209,85 @@ func.func @hoist_linalg_ops_div_by_zero(%a : tensor<128x128xi32>,
 
   func.return %final : tensor<?x128xi32>
 }
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+                            %a : tensor<128x128xf32>, 
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %read = vector.transfer_read %a[%ida, %idb], %cst_0 : tensor<128x128xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_write
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_write
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %c0 = arith.constant 0 : index
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %empty = tensor.empty() : tensor<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %a = vector.transfer_write %cst, %empty[%c0, %c0] : vector<4x4xf32>, tensor<4x4xf32>
+    %read = vector.transfer_read %a[%c0, %c0], %cst_0 : tensor<4x4xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK-NOT: vector.transfer_read
+// CHECK: scf.for
+// CHECK: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops_memref(
+                            %a : memref<128x128xf32>, 
+                            %lb : index,
+                            %ub : index,
+                            %step : index,
+                            %ida : index,
+                            %idb : index) -> vector<4x4xf32> {
+  %cst_0 = arith.constant 0.0 : f32
+  %cst = arith.constant dense<0.0> : vector<4x4xf32>
+  %final = 
+  scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+    %read = vector.transfer_read %a[%ida, %idb], %cst_0 : memref<128x128xf32>, vector<4x4xf32>
+    %out = arith.addf %read, %acc : vector<4x4xf32>
+    scf.yield %out : vector<4x4xf32>
+  }
+  func.return %final : vector<4x4xf32>
+}

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with out-of-bounds xfers over tensors?

@Groverkss
Copy link
Member Author

What happens with out-of-bounds xfers over tensors?

I think it's still speculatable. The op has clearly defined out of bounds behavior, which does not cause any side effects.

@Groverkss Groverkss requested a review from kuhar October 8, 2024 15:27
Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Should we also cover gather in the same sweep?

@Groverkss
Copy link
Member Author

LGTM. Should we also cover gather in the same sweep?

I thought about doing gather/load/store/scatter also, but it looks like they don't have MemoryEffectsOpInterface implemented also. I'm going to defer it for now, since I need to see how to test that interface.

@kuhar
Copy link
Member

kuhar commented Oct 8, 2024

I thought about doing gather/load/store/scatter also, but it looks like they don't have MemoryEffectsOpInterface implemented also. I'm going to defer it for now, since I need to see how to test that interface.

I think vector.gather is the only other op that accepts tensors, no?

@banach-space
Copy link
Contributor

Not really an expert, but going over https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/, it's not clear to me why these Ops fall under ConditionallySpeculatable? Could you expand a bit?

I just want to make sure that this is not working around limitations in the LICM logic (which I am not familiar with).

@kuhar
Copy link
Member

kuhar commented Oct 8, 2024

Because executing them speculatively is guaranteed not to have any 'implicit behavior' (side effects / trigger UB / etc.): https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/#categorization.

The same way you can speculate adding two numbers together (value semantics, no UB) but not necessarily a load from memory (memory effects) or arbitrary division (potential UB).

@Groverkss
Copy link
Member Author

Not really an expert, but going over https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/, it's not clear to me why these Ops fall under ConditionallySpeculatable? Could you expand a bit?

I just want to make sure that this is not working around limitations in the LICM logic (which I am not familiar with).

I can give an overview of my reasoning.

LICM can only hoist operations (atleast for now) that do not have any memory side effects and are speculatable (do not cause undefined behavior).

Operations working on tensors (linalg.generic, vector.transfer_*, etc.) operations have no memory side effects and don't have undefined behavior generally (atleast from my understanding of tensors having value semantics). So when operating on tensors, these operations are "Pure".

Operations working on memrefs (memref versions of linalg.generic, vector.transfer_*, etc.) operations have clearly defined memory side effects but may introduce out of bound behavior. So they are non-speculatable (atleast without any further analysis).

Since linalg.generic/vector.transfer_... ops work on both memref and tensors, they have to be "ConditionallySpeculatable", the condition being if they have memref or tensor semantics.

I don't think this is working around limitations of LICM, rather it is how LICM expects operations to give it information to work.

@Groverkss
Copy link
Member Author

I thought about doing gather/load/store/scatter also, but it looks like they don't have MemoryEffectsOpInterface implemented also. I'm going to defer it for now, since I need to see how to test that interface.

I think vector.gather is the only other op that accepts tensors, no?

interesting... I can send a follow up pr for that.

@banach-space
Copy link
Contributor

Thanks for elaborating, that helps!

Because executing them speculatively is guaranteed not to have any 'implicit behavior' (side effects / trigger UB / etc.): https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/#categorization.

A lot (most?) things in that doc only really matter when dealing with MemRefs. And the implementation here also implies that:

  • when operating on tensors, everything is safe,
  • when operating on memrefs, exercise caution.

This just makes me wonder, shouldn't LICM be checking for tensor semantics instead?

Since linalg.generic/vector.transfer_... ops work on both memref and tensors, they have to be "ConditionallySpeculatable", the condition being if they have memref or tensor semantics.

Or, we are missing sth like SpeculableWhenUsingTensorSemantics.

@Groverkss
Copy link
Member Author

Thanks for elaborating, that helps!

Because executing them speculatively is guaranteed not to have any 'implicit behavior' (side effects / trigger UB / etc.): https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/#categorization.

A lot (most?) things in that doc only really matter when dealing with MemRefs. And the implementation here also implies that:

  • when operating on tensors, everything is safe,
  • when operating on memrefs, exercise caution.

This just makes me wonder, shouldn't LICM be checking for tensor semantics instead?

Since linalg.generic/vector.transfer_... ops work on both memref and tensors, they have to be "ConditionallySpeculatable", the condition being if they have memref or tensor semantics.

Or, we are missing sth like SpeculableWhenUsingTensorSemantics.

I think "SpeculableWhenUsingTensorSemantics" is exactly what's missing. Although i'm not sure if it's that simple because we would also need some way of saying when the op has tensor semantics.

@kuhar
Copy link
Member

kuhar commented Oct 8, 2024

A lot (most?) things in that doc only really matter when dealing with MemRefs. And the implementation here also implies that:
when operating on tensors, everything is safe,
when operating on memrefs, exercise caution.

This is not true in the general case outside of these ops. For example, dividing a tensor by tensor (elementwise) is not always safe. I think what we have with these 'element access ops' here might be a special case.

@Groverkss Groverkss merged commit 32db6fb into llvm:main Oct 9, 2024
9 checks passed
@banach-space
Copy link
Contributor

@Groverkss You have landed this change without all of the active reviewers explicitly approving it, please avoid that. As per contribution guidelines:

Please address the feedback and re-post an updated version of your patch. This cycle continues until all requests and comments have been addressed and a reviewer accepts the patch with a Looks good to me or LGTM. Once that is done the change can be committed.

I read that as "if a reviewer leaves a comment, please address that and wait for them to confirm that their comments have been addressed by posting LGTM". That last part didn't happen, hence this message.

I'm not an expert on this area, but this changes looks good to me and I appreciate you did address my comments. Still, I'd appreciate if you followed the process in the future, thanks!

@kuhar
Copy link
Member

kuhar commented Oct 9, 2024

@banach-space overall, I agree with you as what the best practice should be, but my reading of the guidelines is different:

This cycle continues until all requests and comments have been addressed and a reviewer accepts the patch with a Looks good to me or LGTM

This presents two criteria: (1) there are no outstanding comments that require resolution, (2) the PR received at least one LGTM. It doesn't require you to receive a LGTM from all reviewers / commenters. IME, very often this is a judgement call of whether you reasonably believe that all comments have been addressed, and I see waiting for a confirmation as more of a good practice.

@Groverkss
Copy link
Member Author

Groverkss commented Oct 9, 2024

@Groverkss You have landed this change without all of the active reviewers explicitly approving it, please avoid that. As per contribution guidelines:

Please address the feedback and re-post an updated version of your patch. This cycle continues until all requests and comments have been addressed and a reviewer accepts the patch with a Looks good to me or LGTM. Once that is done the change can be committed.

I read that as "if a reviewer leaves a comment, please address that and wait for them to confirm that their comments have been addressed by posting LGTM". That last part didn't happen, hence this message.

I'm not an expert on this area, but this changes looks good to me and I appreciate you did address my comments. Still, I'd appreciate if you followed the process in the future, thanks!

Ah sorry, I misunderstood the only review comment being a minor actionable comment about tests as a signal that it should be fine to land once it's fixed as you seemed satisfied with the explanation provided before. But I should have taken more caution and waited, as I wasn't in any hurry to land it and it's always nicer to get more eyes here. I really appreciate the review and hopefully, this clears why I landed it and I'm happy to still address any post review comments :)

@kuhar
Copy link
Member

kuhar commented Oct 9, 2024

@banach-space overall, I agree with you as what the best practice should be, but my reading of the guidelines is different:

This cycle continues until all requests and comments have been addressed and a reviewer accepts the patch with a Looks good to me or LGTM

This presents two criteria: (1) there are no outstanding comments that require resolution, (2) the PR received at least one LGTM. It doesn't require you to receive a LGTM from all reviewers / commenters. IME, very often this is a judgement call of whether you reasonably believe that all comments have been addressed, and I see waiting for a confirmation as more of a good practice.

For example, you typically wouldn't wait for approval from all 'drive-by' reviewers who point out typos and other nits. (Not saying your comment on this PR falls into this category.)

@banach-space
Copy link
Contributor

This presents two criteria: (...) (2) the PR received at least one LGTM.

Note that your LGTM landed before my comments and further changes in the latest commit. In situations like this, good practice (to me) would be either:

  • the original reviewer re-confirming their LGTM: "This still looks good to me and all other comments have also been addressed, good to go!", or
  • the outstanding reviewers approving, or
  • the author leaving a note: "Not waiting for further LGTM, landing as is because ...".

I see waiting for a confirmation as more of a good practice.

Agreed and that's exactly what I'm after :-)

I misunderstood the only review comment being a minor actionable comment about tests as a signal that it should be fine to land once it's fixed as you seemed satisfied with the explanation provided before.

And I should've been clearer.

Anyway, all good, thanks for the discussion and for the contribution!

@kuhar
Copy link
Member

kuhar commented Oct 9, 2024

@banach-space this is a pretty good breakdown, maybe it would be worth spelling this out in the guidelines like that?

@banach-space
Copy link
Contributor

@banach-space this is a pretty good breakdown, maybe it would be worth spelling this out in the guidelines like that?

Sure, lets see what others think:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants