Skip to content

Conversation

@sushgokh
Copy link
Contributor

…nteger division

The last resort to vectorize a bundle of integer divisions is considered scalarizing it. Currently, the cost estimates for scalarizing a vector division can be considerably overestimated as is the scenario with this motivating test case i.e. vector cost should not deviate much from the scalar cost.

Future patch will try to improve the scalarization cost.

@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Sushant Gokhale (sushgokh)

Changes

…nteger division

The last resort to vectorize a bundle of integer divisions is considered scalarizing it. Currently, the cost estimates for scalarizing a vector division can be considerably overestimated as is the scenario with this motivating test case i.e. vector cost should not deviate much from the scalar cost.

Future patch will try to improve the scalarization cost.


Full diff: https://github.com/llvm/llvm-project/pull/114107.diff

1 Files Affected:

  • (added) llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll (+23)
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll
new file mode 100644
index 00000000000000..12bbf6e043ca91
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll
@@ -0,0 +1,23 @@
+; RUN: opt -mtriple=aarch64 -passes=slp-vectorizer -debug-only=SLP -S -disable-output < %s 2>&1 | FileCheck %s
+
+define <4 x i8> @v4i8(<4 x i8> %a, <4 x i8> %b)
+{
+; CHECK: SLP: Found cost = 18 for VF=4
+  %a0 = extractelement <4 x i8> %a, i64 0
+  %a1 = extractelement <4 x i8> %a, i64 1
+  %a2 = extractelement <4 x i8> %a, i64 2
+  %a3 = extractelement <4 x i8> %a, i64 3
+  %b0 = extractelement <4 x i8> %b, i64 0
+  %b1 = extractelement <4 x i8> %b, i64 1
+  %b2 = extractelement <4 x i8> %b, i64 2
+  %b3 = extractelement <4 x i8> %b, i64 3
+  %1 = sdiv i8 %a0, undef
+  %2 = sdiv i8 %a1, 1
+  %3 = sdiv i8 %a2, 2
+  %4 = sdiv i8 %a3, 4
+  %r0 = insertelement <4 x i8> poison, i8 %1, i32 0
+  %r1 = insertelement <4 x i8> %r0, i8 %2, i32 1
+  %r2 = insertelement <4 x i8> %r1, i8 %3, i32 2
+  %r3 = insertelement <4 x i8> %r2, i8 %4, i32 3
+  ret <4 x i8> %r3
+}


define <4 x i8> @v4i8(<4 x i8> %a, <4 x i8> %b)
{
; CHECK: SLP: Found cost = 18 for VF=4
Copy link
Member

@alexey-bataev alexey-bataev Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid checks for debug outputs, use -pass-remarks-output= instead

…nteger division

The last resort to vectorize a bundle of integer divisions is considered scalarizing it.
Currently, the cost estimates for scalarizing a vector division can be considerably overestimated
as is the scenario with this motivating test case i.e. vector cost should not deviate much from
the  scalar cost.

Future patch will try to improve the scalarization cost.
@sushgokh sushgokh force-pushed the nfc-tests-for-slp-div branch from b1297dd to d38f624 Compare November 4, 2024 05:35
@sushgokh
Copy link
Contributor Author

ping

1 similar comment
@sushgokh
Copy link
Contributor Author

ping

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK. LGTM

Feel free to put up costmodel changes too if you have them already, even if they include the tests again. It can be good to separate the tests out but don't let the code review block other patches. If nothing else there might be other tests that are useful to add or change that becomes clearer with the cost model is changed.

@sushgokh
Copy link
Contributor Author

Thanks @davemgreen for the approval and your opinion.

@sushgokh sushgokh merged commit 7e85cb8 into llvm:main Nov 19, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants