Skip to content

Commit 2034585

Browse files
authored
[Codegen][GenericVectorization] Fix incorrect usage of std::accumulation that led to overflow (iree-org#21920)
The `std::accumulation` for checking if the vector exceeds size limit uses literal integer '1' as the initial value, which effectively makes the entire calculation to use int32_t (on most platforms) as the accumulator data type. This might lead to sign overflow when one of the dimension sizes is between 32 bits and 64 bits, and bypass the check. ------ This issue was originally observed with tensor with dynamic shape like `<1x8x?xf32>`, in which case the ValueBoundConstraintSet somehow think the upper bound for this was enormously large, hence overflowing the check fixed in this PR. One might argue that something went wrong in ValueBoundConstraintSet which I kind of agree. Nevertheless, the `std::multiplies<int64_t>()` used in `std::accumulate` in this case hints that we _do_ want to use int64_t as the accumulator type. So we should fix this issue anyway. Signed-off-by: Min Hsu <[email protected]>
1 parent ec3e28e commit 2034585

File tree

2 files changed

+34
-1
lines changed

2 files changed

+34
-1
lines changed

compiler/src/iree/compiler/Codegen/Common/GenericVectorization.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ void GenericVectorizationPass::runOnOperation() {
178178
// Do not vectorize the op if the vector size is greater than or equal
179179
// to limit.
180180
if (enableVectorMasking) {
181-
if (std::accumulate(vectorSizes.begin(), vectorSizes.end(), 1,
181+
if (std::accumulate(vectorSizes.begin(), vectorSizes.end(), 1LL,
182182
std::multiplies<int64_t>()) >= maxVectorSize)
183183
continue;
184184
} else {

compiler/src/iree/compiler/Codegen/Common/test/generic_vectorization.mlir

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -778,3 +778,36 @@ func.func @linalg_ext_gather(%source : tensor<1024x128xi32>, %indices : tensor<1
778778

779779
// CHECK-LABEL: @linalg_ext_gather
780780
// CHECK: transfer_gather
781+
782+
// -----
783+
784+
func.func @negative_no_vectorize_large_vector(%arg0 : tensor<1x9007199254740991x1xf16>, %output : tensor<1x9007199254740991xf32>) -> tensor<1x9007199254740991xf32> {
785+
%cst_2 = arith.constant 0.000000e+00 : f32
786+
%cst_0 = arith.constant 8.000000e+00 : f16
787+
%r = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x9007199254740991x1xf16>) outs(%output : tensor<1x9007199254740991xf32>) {
788+
^bb0(%in: f16, %out: f32):
789+
%76 = arith.truncf %cst_2 : f32 to f16
790+
%77 = arith.divf %in, %cst_0 : f16
791+
%78 = arith.addf %77, %76 : f16
792+
%79 = arith.extf %78 : f16 to f32
793+
%80 = arith.maxnumf %79, %out : f32
794+
linalg.yield %80 : f32
795+
} -> tensor<1x9007199254740991xf32>
796+
return %r : tensor<1x9007199254740991xf32>
797+
}
798+
799+
// CHECK-MASK-LABEL: func.func @negative_no_vectorize_large_vector(
800+
// CHECK-MASK-SAME: %[[ARG0:.*]]: tensor<1x9007199254740991x1xf16>,
801+
// CHECK-MASK-SAME: %[[ARG1:.*]]: tensor<1x9007199254740991xf32>) -> tensor<1x9007199254740991xf32> {
802+
// CHECK-MASK: %[[VAL_0:.*]] = arith.constant 0.000000e+00 : f16
803+
// CHECK-MASK: %[[VAL_1:.*]] = arith.constant 8.000000e+00 : f16
804+
// CHECK-MASK: %[[VAL_2:.*]] = linalg.generic {indexing_maps = [#{{.*}}, #{{.*}}], iterator_types = ["parallel", "parallel", "reduction"]} ins(%[[ARG0]] : tensor<1x9007199254740991x1xf16>) outs(%[[ARG1]] : tensor<1x9007199254740991xf32>) {
805+
// CHECK-MASK: ^bb0(%[[VAL_3:.*]]: f16, %[[VAL_4:.*]]: f32):
806+
// CHECK-MASK: %[[VAL_5:.*]] = arith.divf %[[VAL_3]], %[[VAL_1]] : f16
807+
// CHECK-MASK: %[[VAL_6:.*]] = arith.addf %[[VAL_5]], %[[VAL_0]] : f16
808+
// CHECK-MASK: %[[VAL_7:.*]] = arith.extf %[[VAL_6]] : f16 to f32
809+
// CHECK-MASK: %[[VAL_8:.*]] = arith.maxnumf %[[VAL_7]], %[[VAL_4]] : f32
810+
// CHECK-MASK: linalg.yield %[[VAL_8]] : f32
811+
// CHECK-MASK: } -> tensor<1x9007199254740991xf32>
812+
// CHECK-MASK: return %[[VAL_2]] : tensor<1x9007199254740991xf32>
813+
// CHECK-MASK: }

0 commit comments

Comments
 (0)