Skip to content

Conversation

@Hanumanth04
Copy link
Contributor

@Hanumanth04 Hanumanth04 commented Nov 5, 2025

I hit another runtime verification issue (similar to #164878) while working with TFLite models. The verifier is incorrectly rejecting tensor.extract_slice operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces offset < dim_size. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions.

Simple example that demonstrates the issue:

func.func @extract_empty_slice(%tensor: tensor<?xf32>, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor<?xf32> to tensor<?xf32>
  return
}

For the above example, the check evaluates 10 < 10 which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access.

Real-world repro from the TensorFlow Lite models:

This issue manifests while lowering TFLite models and a lot of our system tests are failing due to this. Here's a simplified version showing the problematic pattern:

In this code, %extracted_slice_0 becomes an empty tensor when SSA value %15 reaches 10 (on the final loop iteration), making %16 = 0. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification.

func.func @simplified_repro_from_tensorflowlite_model(%arg0: tensor<10x4x1xf32>) -> tensor<10x4x1xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() <{values = dense<0> : tensor<i32>}> : () -> tensor<i32>
  %1 = "tosa.const"() <{values = dense<1> : tensor<i32>}> : () -> tensor<i32>
  %2 = "tosa.const"() <{values = dense<10> : tensor<i32>}> : () -> tensor<i32>
  %3 = "tosa.const"() <{values = dense<-1> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "tosa.const"() <{values = dense<0> : tensor<2xi32>}> : () -> tensor<2xi32>
  %5 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x4x1xf32>}> : () -> tensor<1x4x1xf32>
  %c4_1 = tosa.const_shape  {values = dense<1> : tensor<1xindex>} : () -> !tosa.shape<1>
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor<i32>, tensor<10x4x1xf32>) -> (tensor<i32>, tensor<10x4x1xf32>) {
    %7 = tosa.greater %2, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
    %extracted = tensor.extract %7[] : tensor<i1>
    scf.condition(%extracted) %arg1, %arg2 : tensor<i32>, tensor<10x4x1xf32>
  } do {
  ^bb0(%arg1: tensor<i32>, %arg2: tensor<10x4x1xf32>):
    %7 = tosa.add %arg1, %1 : (tensor<i32>, tensor<i32>) -> tensor<i32>
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_0 = tensor.extract %9[%c0] : tensor<3xi32>
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor<i32>, !tosa.shape<1>) -> tensor<1xi32>
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor<1xi32>, tensor<2xi32>) -> tensor<3xi32>
    
    %extracted_1 = tensor.extract %14[%c0] : tensor<3xi32>
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor<3xi32>
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor<3xi32>
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor<10x4x1xf32> to tensor<?x4x1xf32>
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor<?x4x1xf32>, tensor<1x4x1xf32>, tensor<?x4x1xf32>) -> tensor<10x4x1xf32>
    
    scf.yield %7, %19 : tensor<i32>, tensor<10x4x1xf32>
  }
  
  return %6#1 : tensor<10x4x1xf32>
}

The fix:

Make the offset check conditional on slice size:

  • Empty slice (size == 0): allow 0 <= offset <= dim_size
  • Non-empty slice (size > 0): require 0 <= offset < dim_size

Question for reviewers:
Should we also relax the static verifier to allow this edge case? Currently, the static verifier rejects the following IR:

%tensor = arith.constant dense<1.0> : tensor<10xf32>
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor<10xf32> to tensor<0xf32>

Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes.

P.S. We have a similar issue with memref.subview. I will send a separate patch for the issue.

@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Hanumanth (Hanumanth04)

Changes

I hit another runtime verification issue while working with TFLite models. The verifier is incorrectly rejecting tensor.extract_slice operations when extracting an empty slice (size=0) that starts exactly at the tensor boundary.

The current runtime verification unconditionally enforces offset &lt; dim_size. This makes sense for non-empty slices, but it's too strict for empty slices, causing false positives that lead to spurious runtime assertions.

Simple example that demonstrates the issue:

func.func @<!-- -->extract_empty_slice(%tensor: tensor&lt;?xf32&gt;, %offset: index, %size: index) {
  // When called with: tensor size=10, offset=10, size=0
  // Runtime verification fails: "offset 0 is out-of-bounds"
  %slice = tensor.extract_slice %tensor[%offset] [%size] [1] 
    : tensor&lt;?xf32&gt; to tensor&lt;?xf32&gt;
  return
}

The check evaluates 10 &lt; 10 which is false, so verification fails. However, I believe this operation should be valid - we're extracting zero elements, so there's no actual out-of-bounds access.

Real-world repro from TensorFlow Lite model:

This issue manifests in actual TFLite model lowering. Here's a simplified version showing the problematic pattern:

In this code, %extracted_slice_0 becomes an empty tensor when SSA value %15 reaches 10 (on the final loop iteration), making %16 = 0. The operation extracts zero elements along dimension 0, which is semantically valid but fails runtime verification.

func.func @<!-- -->simplified_repro_from_tensorflowlite_model(%arg0: tensor&lt;10x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt; {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c10 = arith.constant 10 : index
  %c-1 = arith.constant -1 : index
  
  %0 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %1 = "tosa.const"() &lt;{values = dense&lt;1&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %2 = "tosa.const"() &lt;{values = dense&lt;10&gt; : tensor&lt;i32&gt;}&gt; : () -&gt; tensor&lt;i32&gt;
  %3 = "tosa.const"() &lt;{values = dense&lt;-1&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %4 = "tosa.const"() &lt;{values = dense&lt;0&gt; : tensor&lt;2xi32&gt;}&gt; : () -&gt; tensor&lt;2xi32&gt;
  %5 = "tosa.const"() &lt;{values = dense&lt;0.000000e+00&gt; : tensor&lt;1x4x1xf32&gt;}&gt; : () -&gt; tensor&lt;1x4x1xf32&gt;
  %c4_1 = tosa.const_shape  {values = dense&lt;1&gt; : tensor&lt;1xindex&gt;} : () -&gt; !tosa.shape&lt;1&gt;
  
  %6:2 = scf.while (%arg1 = %0, %arg2 = %arg0) 
    : (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) -&gt; (tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;) {
    %7 = tosa.greater %2, %arg1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i1&gt;
    %extracted = tensor.extract %7[] : tensor&lt;i1&gt;
    scf.condition(%extracted) %arg1, %arg2 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  } do {
  ^bb0(%arg1: tensor&lt;i32&gt;, %arg2: tensor&lt;10x4x1xf32&gt;):
    %7 = tosa.add %arg1, %1 : (tensor&lt;i32&gt;, tensor&lt;i32&gt;) -&gt; tensor&lt;i32&gt;
    
    // First slice
    %8 = tosa.reshape %arg1, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %9 = tosa.concat %8, %3 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_0 = tensor.extract %9[%c0] : tensor&lt;3xi32&gt;
    %10 = index.casts %extracted_0 : i32 to index
    %11 = arith.cmpi eq, %10, %c-1 : index
    %12 = arith.select %11, %c10, %10 : index
    
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [%12, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    // Second slice - this is where the failure occurs
    %13 = tosa.reshape %7, %c4_1 : (tensor&lt;i32&gt;, !tosa.shape&lt;1&gt;) -&gt; tensor&lt;1xi32&gt;
    %14 = tosa.concat %13, %4 {axis = 0 : i32} : (tensor&lt;1xi32&gt;, tensor&lt;2xi32&gt;) -&gt; tensor&lt;3xi32&gt;
    
    %extracted_1 = tensor.extract %14[%c0] : tensor&lt;3xi32&gt;
    %15 = index.castu %extracted_1 : i32 to index
    %16 = arith.subi %c10, %15 : index  // size = 10 - offset
    
    %extracted_2 = tensor.extract %14[%c1] : tensor&lt;3xi32&gt;
    %17 = index.castu %extracted_2 : i32 to index
    
    %extracted_3 = tensor.extract %14[%c2] : tensor&lt;3xi32&gt;
    %18 = index.castu %extracted_3 : i32 to index
    
    // On the last loop iteration: %15=10, %16=0
    // %extracted_slice_0 becomes an empty tensor
    // Runtime verification fails: "offset 0 is out-of-bounds"
    %extracted_slice_0 = tensor.extract_slice %arg2[%15, %17, %18] [%16, 4, 1] [1, 1, 1] 
      : tensor&lt;10x4x1xf32&gt; to tensor&lt;?x4x1xf32&gt;
    
    %19 = tosa.concat %extracted_slice, %5, %extracted_slice_0 {axis = 0 : i32} 
      : (tensor&lt;?x4x1xf32&gt;, tensor&lt;1x4x1xf32&gt;, tensor&lt;?x4x1xf32&gt;) -&gt; tensor&lt;10x4x1xf32&gt;
    
    scf.yield %7, %19 : tensor&lt;i32&gt;, tensor&lt;10x4x1xf32&gt;
  }
  
  return %6#<!-- -->1 : tensor&lt;10x4x1xf32&gt;
}

Why I feel this should be allowed:

TFLite's tfl.slice uses the rule: offset + size &lt;= dim_size

For our case: 10 + 0 &lt;= 10 is true, so TFLite accepts it.

Python/NumPy also allows this:

arr = np.arange(10)
empty = arr[10:10]  # Valid, returns []

Position len(arr) is a valid starting point for an empty slice - you're not actually reading anything, so the boundary position is safe.

The fix:

Make the offset check conditional on slice size:

  • Empty slice (size == 0): allow 0 &lt;= offset &lt;= dim_size
  • Non-empty slice (size > 0): require 0 &lt;= offset &lt; dim_size

Question for reviewers:
Should we also relax the static verifier to allow this edge case? Currently, the static verifier would reject:

%tensor = arith.constant dense&lt;1.0&gt; : tensor&lt;10xf32&gt;
%slice = tensor.extract_slice %tensor[10] [0] [1] : tensor&lt;10xf32&gt; to tensor&lt;0xf32&gt;

Since we're allowing it at runtime for dynamic shapes, it seems inconsistent to reject it statically. However, I wanted to get feedback before making that change - this PR focuses only on the runtime verification fix for dynamic shapes.

P.S. We have a similar issue with memref.subview. I will send a separate patch for the issue.


Full diff: https://github.com/llvm/llvm-project/pull/166569.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp (+52-33)
  • (modified) mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir (+9)
diff --git a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
index 753cb95b1c906..d35f458cbdb36 100644
--- a/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
+++ b/mlir/lib/Dialect/Tensor/Transforms/RuntimeOpVerification.cpp
@@ -155,13 +155,15 @@ struct ExtractSliceOpInterface
     RankedTensorType sourceType = extractSliceOp.getSource().getType();
 
     // For each dimension, assert that:
-    // 0 <= offset < dim_size
-    // 0 <= offset + (size - 1) * stride < dim_size
+    // For empty slices (size == 0)   : 0 <= offset <= dim_size
+    // For non-empty slices (size > 0): 0 <= offset < dim_size
+    //                                  0 <= offset + (size - 1) * stride <
+    //                                  dim_size
     Value zero = arith::ConstantIndexOp::create(builder, loc, 0);
     Value one = arith::ConstantIndexOp::create(builder, loc, 1);
 
     for (int64_t i : llvm::seq<int64_t>(0, sourceType.getRank())) {
-      // Reset insertion point to before the operation for each dimension
+
       builder.setInsertionPoint(extractSliceOp);
 
       Value offset = getValueOrCreateConstantIndexOp(
@@ -170,46 +172,63 @@ struct ExtractSliceOpInterface
           builder, loc, extractSliceOp.getMixedSizes()[i]);
       Value stride = getValueOrCreateConstantIndexOp(
           builder, loc, extractSliceOp.getMixedStrides()[i]);
-
-      // Verify that offset is in-bounds.
       Value dimSize = builder.createOrFold<tensor::DimOp>(
           loc, extractSliceOp.getSource(), i);
-      Value offsetInBounds =
-          generateInBoundsCheck(builder, loc, offset, zero, dimSize);
-      cf::AssertOp::create(builder, loc, offsetInBounds,
+
+      // Verify that offset is in-bounds (conditional on slice size).
+      Value sizeIsZero = arith::CmpIOp::create(
+          builder, loc, arith::CmpIPredicate::eq, size, zero);
+      auto offsetCheckIf = scf::IfOp::create(
+          builder, loc, sizeIsZero,
+          [&](OpBuilder &b, Location loc) {
+            // For empty slices, offset can be at the boundary: 0 <= offset <=
+            // dimSize.
+            Value offsetGEZero = arith::CmpIOp::create(
+                b, loc, arith::CmpIPredicate::sge, offset, zero);
+            Value offsetLEDimSize = arith::CmpIOp::create(
+                b, loc, arith::CmpIPredicate::sle, offset, dimSize);
+            Value emptyOffsetValid =
+                arith::AndIOp::create(b, loc, offsetGEZero, offsetLEDimSize);
+            scf::YieldOp::create(b, loc, emptyOffsetValid);
+          },
+          [&](OpBuilder &b, Location loc) {
+            // For non-empty slices, offset must be a valid index: 0 <= offset <
+            // dimSize.
+            Value offsetInBounds =
+                generateInBoundsCheck(b, loc, offset, zero, dimSize);
+            scf::YieldOp::create(b, loc, offsetInBounds);
+          });
+
+      Value offsetCondition = offsetCheckIf.getResult(0);
+      cf::AssertOp::create(builder, loc, offsetCondition,
                            generateErrorMessage(op, "offset " +
                                                         std::to_string(i) +
                                                         " is out-of-bounds"));
 
-      // Only verify if size > 0
+      // Verify that the slice endpoint is in-bounds (only for non-empty
+      // slices).
       Value sizeIsNonZero = arith::CmpIOp::create(
           builder, loc, arith::CmpIPredicate::sgt, size, zero);
+      auto ifOp = scf::IfOp::create(
+          builder, loc, sizeIsNonZero,
+          [&](OpBuilder &b, Location loc) {
+            // Verify that slice does not run out-of-bounds.
+            Value sizeMinusOne = arith::SubIOp::create(b, loc, size, one);
+            Value sizeMinusOneTimesStride =
+                arith::MulIOp::create(b, loc, sizeMinusOne, stride);
+            Value lastPos =
+                arith::AddIOp::create(b, loc, offset, sizeMinusOneTimesStride);
+            Value lastPosInBounds =
+                generateInBoundsCheck(b, loc, lastPos, zero, dimSize);
+            scf::YieldOp::create(b, loc, lastPosInBounds);
+          },
+          [&](OpBuilder &b, Location loc) {
+            Value trueVal =
+                arith::ConstantOp::create(b, loc, b.getBoolAttr(true));
+            scf::YieldOp::create(b, loc, trueVal);
+          });
 
-      auto ifOp = scf::IfOp::create(builder, loc, builder.getI1Type(),
-                                    sizeIsNonZero, /*withElseRegion=*/true);
-
-      // Populate the "then" region (for size > 0).
-      builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
-
-      // Verify that slice does not run out-of-bounds.
-      Value sizeMinusOne = arith::SubIOp::create(builder, loc, size, one);
-      Value sizeMinusOneTimesStride =
-          arith::MulIOp::create(builder, loc, sizeMinusOne, stride);
-      Value lastPos =
-          arith::AddIOp::create(builder, loc, offset, sizeMinusOneTimesStride);
-      Value lastPosInBounds =
-          generateInBoundsCheck(builder, loc, lastPos, zero, dimSize);
-      scf::YieldOp::create(builder, loc, lastPosInBounds);
-
-      // Populate the "else" region (for size == 0).
-      builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
-      Value trueVal =
-          arith::ConstantOp::create(builder, loc, builder.getBoolAttr(true));
-      scf::YieldOp::create(builder, loc, trueVal);
-
-      builder.setInsertionPointAfter(ifOp);
       Value finalCondition = ifOp.getResult(0);
-
       cf::AssertOp::create(
           builder, loc, finalCondition,
           generateErrorMessage(
diff --git a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
index a77fa310a3699..745eea37f7fca 100644
--- a/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
+++ b/mlir/test/Integration/Dialect/Tensor/extract_slice-runtime-verification.mlir
@@ -39,6 +39,11 @@ func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index,
     return
 }
 
+func.func @extract_slice_empty_tensor(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index, %offset: index) {
+    tensor.extract_slice %arg0[%offset, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
+    return
+}
+
 
 func.func @main() {
   %0 = arith.constant 0 : index
@@ -115,5 +120,9 @@ func.func @main() {
   %dim_2 = arith.constant 1 : index
   func.call @extract_slice_zero_size_dim(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> ()
 
+  // CHECK-NOT: ERROR: Runtime op verification failed
+  %offset = arith.constant 10 : index  
+  func.call @extract_slice_empty_tensor(%cst10x4x1xf32, %dim_0, %dim_1, %dim_2, %offset) : (tensor<10x4x1xf32>, index, index, index, index) -> ()
+
   return
 }

@Hanumanth04
Copy link
Contributor Author

Hi @matthias-springer , could you please look at this PR when you get a chance? Thanks!

@noclowns
Copy link
Contributor

noclowns commented Nov 7, 2025

A pedantic comment about the comparison to numpy/python.
Python/NumPy allows fully out-of-bounds slicing without throwing errors, so it's different, much looser semantics, than what you're seeking to implement. Your other points seem valid.

arr = np.arange(10)
empty = arr[15:4]  # returns [], valid or not
empty = arr[15:15]  # returns [], valid or not
clamped = arr[9:15] # returns [9], the out-of-bounds arg value is clamped

@Hanumanth04
Copy link
Contributor Author

A pedantic comment about the comparison to numpy/python. Python/NumPy allows fully out-of-bounds slicing without throwing errors, so it's different, much looser semantics, than what you're seeking to implement. Your other points seem valid.

arr = np.arange(10)
empty = arr[15:4]  # returns [], valid or not
empty = arr[15:15]  # returns [], valid or not
clamped = arr[9:15] # returns [9], the out-of-bounds arg value is clamped

Thanks @noclowns for looking into this PR. Appreciate it.

Good point! You're right that Python/NumPy has looser semantics. I'll remove the example from the description to avoid confusion.

The core issue is that the verifier is rejecting empty slices at boundary positions, which are valid operations since no memory is accessed. I'll update the PR description to focus on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants