-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices #166569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[mlir][tensor] Fix runtime verification for tensor.extract_slice for empty tensor slices #166569
Conversation
|
@llvm/pr-subscribers-mlir-tensor @llvm/pr-subscribers-mlir Author: Hanumanth (Hanumanth04) ChangesI hit another runtime verification issue while working with TFLite models. The verifier is incorrectly rejecting The current runtime verification unconditionally enforces Simple example that demonstrates the issue: func.func @<!-- -->extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
// When called with: tensor size=10, offset=10, size=0
// Runtime verification fails: "offset 0 is out-of-bounds"
%slice = tensor.extract_slice %tensor[%offset] [%size] [1]
: tensor<?xf32> to tensor<?xf32>
return
}The check evaluates Real-world repro from TensorFlow Lite model: This issue manifests in actual TFLite model lowering. Here's a simplified version showing the problematic pattern: In this code, func.func @<!-- -->simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index
%c10 = arith.constant 10 : index
%c-1 = arith.constant -1 : index
%0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
%1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
%2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
%3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
%4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
%5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
%c4_1 = tosa.const_shape {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
%6:2 = scf.while (%arg1 = %0, %arg2 = %arg0)
: (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
%7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
%extracted = tensor.extract %7[] : tensor<i1>
scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
} do {
^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
%7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
// First slice
%8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
%10 = index.casts %extracted_0 : i32 to index
%11 = arith.cmpi eq, %10, %c-1 : index
%12 = arith.select %11, %c10, %10 : index
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
// Second slice - this is where the failure occurs
%13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
%14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
%extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
%15 = index.castu %extracted_1 : i32 to index
%16 = arith.subi %c10, %15 : index // size = 10 - offset
%extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
%17 = index.castu %extracted_2 : i32 to index
%extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
%18 = index.castu %extracted_3 : i32 to index
// On the last loop iteration: %15=10, %16=0
// %extracted_slice_0 becomes an empty tensor
// Runtime verification fails: "offset 0 is out-of-bounds"
%extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1]
: tensor<10x4x1xf32> to tensor<?x4x1xf32>
%19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32}
: (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
}
return %6#<!-- -->1 : tensor<10x4x1xf32>
}Why I feel this should be allowed: TFLite's For our case: Python/NumPy also allows this: arr = np.arange(10)
empty = arr[10:10] # Valid, returns []Position The fix: Make the offset check conditional on slice size:
Question for reviewers: %tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes. P.S. We have a similar issue with Full diff: https://github.com/llvm/llvm-project/pull/166569.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
index 753cb95b1c906..d35f458cbdb36 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
@@ -155,13 +155,15 @@ struct ExtractSliceOpInterface
RankedTensorType sourceType = extractSliceOp.getSource().getType();
// For each dimension, assert that:
- // 0 <= offset < dim_size
- // 0 <= offset + (size - 1) * stride < dim_size
+ // For empty slices (size == 0) : 0 <= offset <= dim_size
+ // For non-empty slices (size > 0): 0 <= offset < dim_size
+ // 0 <= offset + (size - 1) * stride <
+ // dim_size
Value zero = arith::ConstantIndexOp::create(builder, loc, 0);
Value one = arith::ConstantIndexOp::create(builder, loc, 1);
for (int64_t i : llvm::seq<int64_t>(0, sourceType.getRank())) {
- // Reset insertion point to before the operation for each dimension
+
builder.setInsertionPoint(extractSliceOp);
Value offset = getValueOrCreateConstantIndexOp(
@@ -170,46 +172,63 @@ struct ExtractSliceOpInterface
builder, loc, extractSliceOp.getMixedSizes()[i]);
Value stride = getValueOrCreateConstantIndexOp(
builder, loc, extractSliceOp.getMixedStrides()[i]);
-
- // Verify that offset is in-bounds.
Value dimSize = builder.createOrFold<tensor::DimOp>(
loc, extractSliceOp.getSource(), i);
- Value offsetInBounds =
- generateInBoundsCheck(builder, loc, offset, zero, dimSize);
- cf::AssertOp::create(builder, loc, offsetInBounds,
+
+ // Verify that offset is in-bounds (conditional on slice size).
+ Value sizeIsZero = arith::CmpIOp::create(
+ builder, loc, arith::CmpIPredicate::eq, size, zero);
+ auto offsetCheckIf = scf::IfOp::create(
+ builder, loc, sizeIsZero,
+ [&](OpBuilder &b, Location loc) {
+ // For empty slices, offset can be at the boundary: 0 <= offset <=
+ // dimSize.
+ Value offsetGEZero = arith::CmpIOp::create(
+ b, loc, arith::CmpIPredicate::sge, offset, zero);
+ Value offsetLEDimSize = arith::CmpIOp::create(
+ b, loc, arith::CmpIPredicate::sle, offset, dimSize);
+ Value emptyOffsetValid =
+ arith::AndIOp::create(b, loc, offsetGEZero, offsetLEDimSize);
+ scf::YieldOp::create(b, loc, emptyOffsetValid);
+ },
+ [&](OpBuilder &b, Location loc) {
+ // For non-empty slices, offset must be a valid index: 0 <= offset <
+ // dimSize.
+ Value offsetInBounds =
+ generateInBoundsCheck(b, loc, offset, zero, dimSize);
+ scf::YieldOp::create(b, loc, offsetInBounds);
+ });
+
+ Value offsetCondition = offsetCheckIf.getResult(0);
+ cf::AssertOp::create(builder, loc, offsetCondition,
generateErrorMessage(op, "offset " +
std::to_string(i) +
" is out-of-bounds"));
- // Only verify if size > 0
+ // Verify that the slice endpoint is in-bounds (only for non-empty
+ // slices).
Value sizeIsNonZero = arith::CmpIOp::create(
builder, loc, arith::CmpIPredicate::sgt, size, zero);
+ auto ifOp = scf::IfOp::create(
+ builder, loc, sizeIsNonZero,
+ [&](OpBuilder &b, Location loc) {
+ // Verify that slice does not run out-of-bounds.
+ Value sizeMinusOne = arith::SubIOp::create(b, loc, size, one);
+ Value sizeMinusOneTimesStride =
+ arith::MulIOp::create(b, loc, sizeMinusOne, stride);
+ Value lastPos =
+ arith::AddIOp::create(b, loc, offset, sizeMinusOneTimesStride);
+ Value lastPosInBounds =
+ generateInBoundsCheck(b, loc, lastPos, zero, dimSize);
+ scf::YieldOp::create(b, loc, lastPosInBounds);
+ },
+ [&](OpBuilder &b, Location loc) {
+ Value trueVal =
+ arith::ConstantOp::create(b, loc, b.getBoolAttr(true));
+ scf::YieldOp::create(b, loc, trueVal);
+ });
- auto ifOp = scf::IfOp::create(builder, loc, builder.getI1Type(),
- sizeIsNonZero, /*withElseRegion=*/true);
-
- // Populate the "then" region (for size > 0).
- builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
-
- // Verify that slice does not run out-of-bounds.
- Value sizeMinusOne = arith::SubIOp::create(builder, loc, size, one);
- Value sizeMinusOneTimesStride =
- arith::MulIOp::create(builder, loc, sizeMinusOne, stride);
- Value lastPos =
- arith::AddIOp::create(builder, loc, offset, sizeMinusOneTimesStride);
- Value lastPosInBounds =
- generateInBoundsCheck(builder, loc, lastPos, zero, dimSize);
- scf::YieldOp::create(builder, loc, lastPosInBounds);
-
- // Populate the "else" region (for size == 0).
- builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
- Value trueVal =
- arith::ConstantOp::create(builder, loc, builder.getBoolAttr(true));
- scf::YieldOp::create(builder, loc, trueVal);
-
- builder.setInsertionPointAfter(ifOp);
Value finalCondition = ifOp.getResult(0);
-
cf::AssertOp::create(
builder, loc, finalCondition,
generateErrorMessage(
diff --git a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
index a77fa310a3699..745eea37f7fca 100644
--- a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
@@ -39,6 +39,11 @@ func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index,
return
}
+func.func @extract_slice_empty_tensor(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) {
+ tensor.extract_slice %arg0[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
+ return
+}
+
func.func @main() {
%0 = arith.constant 0 : index
@@ -115,5 +120,9 @@ func.func @main() {
%dim_2 = arith.constant 1 : index
func.call @extract_slice_zero_size_dim(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> ()
+ // CHECK-NOT: ERROR: Runtime op verification failed
+ %offset = arith.constant 10 : index
+ func.call @extract_slice_empty_tensor(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2, %offset) : (tensor<10x4x1xf32>, index, index, index, index) -> ()
+
return
}
|
|
Hi @matthias-springer , could you please look at this PR when you get a chance? Thanks! |
|
A pedantic comment about the comparison to numpy/python. |
Thanks @noclowns for looking into this PR. Appreciate it. Good point! You're right that Python/NumPy has looser semantics. I'll remove the example from the description to avoid confusion. The core issue is that the verifier is rejecting empty slices at boundary positions, which are valid operations since no memory is accessed. I'll update the PR description to focus on that. |
I hit another runtime verification issue (similar to #164878) while working with TFLite models. The verifier is incorrectly rejecting
tensor.extract_sliceoperations when extracting an empty slice (size=0) that starts exactly at the tensor boundary.The current runtime verification unconditionally enforces
offset < dim_size. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions.Simple example that demonstrates the issue:
For the above example, the check evaluates
10 < 10which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access.Real-world repro from the TensorFlow Lite models:
This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern:
In this code,
%extracted_slice_0becomes an empty tensor when SSA value%15reaches 10 (on the final loop iteration), making%16 = 0. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification.The fix:
Make the offset check conditional on slice size:
0 <= offset <= dim_size0 <= offset < dim_sizeQuestion for reviewers:
Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR:
Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes.
P.S. We have a similar issue with
memref.subview. I will send a separate patch for the issue.