-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][vector] Implement speculation for vector.transferx ops #111533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir Author: Kunwar Grover (Groverkss) ChangesThis patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM. Full diff: https://github.com/llvm/llvm-project/pull/111533.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 45fd1c6e3f9384..b0de7c11b9d436 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1240,6 +1240,7 @@ def Vector_TransferReadOp :
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+ DeclareOpInterfaceMethods<ConditionallySpeculatable>,
AttrSizedOperandSegments,
DestinationStyleOpInterface
]>,
@@ -1487,6 +1488,7 @@ def Vector_TransferWriteOp :
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+ DeclareOpInterfaceMethods<ConditionallySpeculatable>,
AttrSizedOperandSegments,
DestinationStyleOpInterface
]>,
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index dc92bea09dc160..2dfb59edcf3d2f 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -4245,6 +4245,12 @@ void TransferReadOp::getEffects(
SideEffects::DefaultResource::get());
}
+Speculation::Speculatability TransferReadOp::getSpeculatability() {
+ if (!hasPureTensorSemantics())
+ return Speculation::NotSpeculatable;
+ return Speculation::Speculatable;
+}
+
namespace {
/// Store to load forwarding for transfer operations with permuation maps.
/// Even if the permutation maps are different we can still propagate the store
@@ -4627,6 +4633,12 @@ void TransferWriteOp::getEffects(
SideEffects::DefaultResource::get());
}
+Speculation::Speculatability TransferWriteOp::getSpeculatability() {
+ if (!hasPureTensorSemantics())
+ return Speculation::NotSpeculatable;
+ return Speculation::Speculatable;
+}
+
namespace {
/// Remove dead transfer write from the SSA chain so that it an be eliminated by
/// DCE
diff --git a/mlir/test/Transforms/loop-invariant-code-motion.mlir b/mlir/test/Transforms/loop-invariant-code-motion.mlir
index 57f4ece9c9f2a4..be76cb8a6156f4 100644
--- a/mlir/test/Transforms/loop-invariant-code-motion.mlir
+++ b/mlir/test/Transforms/loop-invariant-code-motion.mlir
@@ -1209,3 +1209,85 @@ func.func @hoist_linalg_ops_div_by_zero(%a : tensor<128x128xi32>,
func.return %final : tensor<?x128xi32>
}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+ %a : tensor<128x128xf32>,
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %read = vector.transfer_read %a[%ida, %idb], %cst_0 : tensor<128x128xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_write
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_write
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %c0 = arith.constant 0 : index
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %empty = tensor.empty() : tensor<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %a = vector.transfer_write %cst, %empty[%c0, %c0] : vector<4x4xf32>, tensor<4x4xf32>
+ %read = vector.transfer_read %a[%c0, %c0], %cst_0 : tensor<4x4xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK-NOT: vector.transfer_read
+// CHECK: scf.for
+// CHECK: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops_memref(
+ %a : memref<128x128xf32>,
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %read = vector.transfer_read %a[%ida, %idb], %cst_0 : memref<128x128xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
|
|
@llvm/pr-subscribers-mlir-vector Author: Kunwar Grover (Groverkss) ChangesThis patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM. Full diff: https://github.com/llvm/llvm-project/pull/111533.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
index 45fd1c6e3f9384..b0de7c11b9d436 100644
--- a/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
+++ b/mlir/include/mlir/Dialect/Vector/IR/VectorOps.td
@@ -1240,6 +1240,7 @@ def Vector_TransferReadOp :
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+ DeclareOpInterfaceMethods<ConditionallySpeculatable>,
AttrSizedOperandSegments,
DestinationStyleOpInterface
]>,
@@ -1487,6 +1488,7 @@ def Vector_TransferWriteOp :
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
+ DeclareOpInterfaceMethods<ConditionallySpeculatable>,
AttrSizedOperandSegments,
DestinationStyleOpInterface
]>,
diff --git a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
index dc92bea09dc160..2dfb59edcf3d2f 100644
--- a/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
+++ b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp
@@ -4245,6 +4245,12 @@ void TransferReadOp::getEffects(
SideEffects::DefaultResource::get());
}
+Speculation::Speculatability TransferReadOp::getSpeculatability() {
+ if (!hasPureTensorSemantics())
+ return Speculation::NotSpeculatable;
+ return Speculation::Speculatable;
+}
+
namespace {
/// Store to load forwarding for transfer operations with permuation maps.
/// Even if the permutation maps are different we can still propagate the store
@@ -4627,6 +4633,12 @@ void TransferWriteOp::getEffects(
SideEffects::DefaultResource::get());
}
+Speculation::Speculatability TransferWriteOp::getSpeculatability() {
+ if (!hasPureTensorSemantics())
+ return Speculation::NotSpeculatable;
+ return Speculation::Speculatable;
+}
+
namespace {
/// Remove dead transfer write from the SSA chain so that it an be eliminated by
/// DCE
diff --git a/mlir/test/Transforms/loop-invariant-code-motion.mlir b/mlir/test/Transforms/loop-invariant-code-motion.mlir
index 57f4ece9c9f2a4..be76cb8a6156f4 100644
--- a/mlir/test/Transforms/loop-invariant-code-motion.mlir
+++ b/mlir/test/Transforms/loop-invariant-code-motion.mlir
@@ -1209,3 +1209,85 @@ func.func @hoist_linalg_ops_div_by_zero(%a : tensor<128x128xi32>,
func.return %final : tensor<?x128xi32>
}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+ %a : tensor<128x128xf32>,
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %read = vector.transfer_read %a[%ida, %idb], %cst_0 : tensor<128x128xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK: vector.transfer_write
+// CHECK: vector.transfer_read
+// CHECK: scf.for
+// CHECK-NOT: vector.transfer_write
+// CHECK-NOT: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops(
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %c0 = arith.constant 0 : index
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %empty = tensor.empty() : tensor<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %a = vector.transfer_write %cst, %empty[%c0, %c0] : vector<4x4xf32>, tensor<4x4xf32>
+ %read = vector.transfer_read %a[%c0, %c0], %cst_0 : tensor<4x4xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func @hoist_vector_transfer_ops
+// CHECK-NOT: vector.transfer_read
+// CHECK: scf.for
+// CHECK: vector.transfer_read
+// CHECK: arith.addf
+// CHECK: scf.yield
+func.func @hoist_vector_transfer_ops_memref(
+ %a : memref<128x128xf32>,
+ %lb : index,
+ %ub : index,
+ %step : index,
+ %ida : index,
+ %idb : index) -> vector<4x4xf32> {
+ %cst_0 = arith.constant 0.0 : f32
+ %cst = arith.constant dense<0.0> : vector<4x4xf32>
+ %final =
+ scf.for %i = %lb to %ub step %step iter_args(%acc = %cst) -> vector<4x4xf32> {
+ %read = vector.transfer_read %a[%ida, %idb], %cst_0 : memref<128x128xf32>, vector<4x4xf32>
+ %out = arith.addf %read, %acc : vector<4x4xf32>
+ scf.yield %out : vector<4x4xf32>
+ }
+ func.return %final : vector<4x4xf32>
+}
|
kuhar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with out-of-bounds xfers over tensors?
I think it's still speculatable. The op has clearly defined out of bounds behavior, which does not cause any side effects. |
kuhar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Should we also cover gather in the same sweep?
I thought about doing gather/load/store/scatter also, but it looks like they don't have MemoryEffectsOpInterface implemented also. I'm going to defer it for now, since I need to see how to test that interface. |
I think |
|
Not really an expert, but going over https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/, it's not clear to me why these Ops fall under I just want to make sure that this is not working around limitations in the LICM logic (which I am not familiar with). |
|
Because executing them speculatively is guaranteed not to have any 'implicit behavior' (side effects / trigger UB / etc.): https://mlir.llvm.org/docs/Rationale/SideEffectsAndSpeculation/#categorization. The same way you can speculate adding two numbers together (value semantics, no UB) but not necessarily a load from memory (memory effects) or arbitrary division (potential UB). |
I can give an overview of my reasoning. LICM can only hoist operations (atleast for now) that do not have any memory side effects and are speculatable (do not cause undefined behavior). Operations working on tensors (linalg.generic, vector.transfer_*, etc.) operations have no memory side effects and don't have undefined behavior generally (atleast from my understanding of tensors having value semantics). So when operating on tensors, these operations are "Pure". Operations working on memrefs (memref versions of linalg.generic, vector.transfer_*, etc.) operations have clearly defined memory side effects but may introduce out of bound behavior. So they are non-speculatable (atleast without any further analysis). Since linalg.generic/vector.transfer_... ops work on both memref and tensors, they have to be "ConditionallySpeculatable", the condition being if they have memref or tensor semantics. I don't think this is working around limitations of LICM, rather it is how LICM expects operations to give it information to work. |
interesting... I can send a follow up pr for that. |
|
Thanks for elaborating, that helps!
A lot (most?) things in that doc only really matter when dealing with MemRefs. And the implementation here also implies that:
This just makes me wonder, shouldn't LICM be checking for tensor semantics instead?
Or, we are missing sth like |
I think "SpeculableWhenUsingTensorSemantics" is exactly what's missing. Although i'm not sure if it's that simple because we would also need some way of saying when the op has tensor semantics. |
This is not true in the general case outside of these ops. For example, dividing a tensor by tensor (elementwise) is not always safe. I think what we have with these 'element access ops' here might be a special case. |
|
@Groverkss You have landed this change without all of the active reviewers explicitly approving it, please avoid that. As per contribution guidelines:
I read that as "if a reviewer leaves a comment, please address that and wait for them to confirm that their comments have been addressed by posting LGTM". That last part didn't happen, hence this message. I'm not an expert on this area, but this changes looks good to me and I appreciate you did address my comments. Still, I'd appreciate if you followed the process in the future, thanks! |
|
@banach-space overall, I agree with you as what the best practice should be, but my reading of the guidelines is different:
This presents two criteria: (1) there are no outstanding comments that require resolution, (2) the PR received at least one LGTM. It doesn't require you to receive a LGTM from all reviewers / commenters. IME, very often this is a judgement call of whether you reasonably believe that all comments have been addressed, and I see waiting for a confirmation as more of a good practice. |
Ah sorry, I misunderstood the only review comment being a minor actionable comment about tests as a signal that it should be fine to land once it's fixed as you seemed satisfied with the explanation provided before. But I should have taken more caution and waited, as I wasn't in any hurry to land it and it's always nicer to get more eyes here. I really appreciate the review and hopefully, this clears why I landed it and I'm happy to still address any post review comments :) |
For example, you typically wouldn't wait for approval from all 'drive-by' reviewers who point out typos and other nits. (Not saying your comment on this PR falls into this category.) |
Note that your LGTM landed before my comments and further changes in the latest commit. In situations like this, good practice (to me) would be either:
Agreed and that's exactly what I'm after :-)
And I should've been clearer. Anyway, all good, thanks for the discussion and for the contribution! |
|
@banach-space this is a pretty good breakdown, maybe it would be worth spelling this out in the guidelines like that? |
Sure, lets see what others think: |
This patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM.