Skip to content

Conversation

@sushgokh
Copy link
Contributor

Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.

Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.
@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Sushant Gokhale (sushgokh)

Changes

Current cost-modelling does not take into account cost of materializing const vector. This results in some cases, as the test shows, being vectorized but this may not always be profitable. Future patch will try to address this issue.


Full diff: https://github.com/llvm/llvm-project/pull/116790.diff

1 Files Affected:

  • (added) llvm/test/Transforms/SLPVectorizer/materialize-vector-of-consts.ll (+100)
diff --git a/llvm/test/Transforms/SLPVectorizer/materialize-vector-of-consts.ll b/llvm/test/Transforms/SLPVectorizer/materialize-vector-of-consts.ll
new file mode 100644
index 00000000000000..2f58bd25b75647
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/materialize-vector-of-consts.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: %if aarch64-registered-target %{ opt -passes=slp-vectorizer -mtriple=aarch64 -S %s | FileCheck %s %}
+
+define <2 x float> @v2f32_diff_consts(float %a, float %b)
+; CHECK-LABEL: define <2 x float> @v2f32_diff_consts(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x float> poison, float [[A]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[B]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x float> [[TMP2]], <float 2.200000e+01, float 2.300000e+01>
+; CHECK-NEXT:    ret <2 x float> [[TMP3]]
+;
+{
+  %1 = fmul float %a, 22.0
+  %2 = fmul float %b, 23.0
+  %3 = insertelement <2 x float> poison, float %1, i32 0
+  %4 = insertelement <2 x float> %3, float %2, i32 1
+  ret <2 x float> %4
+}
+
+define <2 x float> @v2f32_const_splat(float %a, float %b)
+; CHECK-LABEL: define <2 x float> @v2f32_const_splat(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x float> poison, float [[A]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[B]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x float> [[TMP2]], splat (float 2.200000e+01)
+; CHECK-NEXT:    ret <2 x float> [[TMP3]]
+;
+{
+  %1 = fmul float %a, 22.0
+  %2 = fmul float %b, 22.0
+  %3 = insertelement <2 x float> poison, float %1, i32 0
+  %4 = insertelement <2 x float> %3, float %2, i32 1
+  ret <2 x float> %4
+}
+
+define <4 x double> @v4f64_illegal_type(double %a, double %b, double %c, double %d)
+; CHECK-LABEL: define <4 x double> @v4f64_illegal_type(
+; CHECK-SAME: double [[A:%.*]], double [[B:%.*]], double [[C:%.*]], double [[D:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <4 x double> poison, double [[A]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[B]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[C]], i32 2
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x double> [[TMP3]], double [[D]], i32 3
+; CHECK-NEXT:    [[TMP5:%.*]] = fmul <4 x double> [[TMP4]], <double 2.100000e+01, double 2.200000e+01, double 2.300000e+01, double 2.400000e+01>
+; CHECK-NEXT:    ret <4 x double> [[TMP5]]
+;
+{
+  %1 = fmul double %a, 21.0
+  %2 = fmul double %b, 22.0
+  %3 = fmul double %c, 23.0
+  %4 = fmul double %d, 24.0
+  %5 = insertelement <4 x double> poison, double %1, i32 0
+  %6 = insertelement <4 x double> %5, double %2, i32 1
+  %7 = insertelement <4 x double> %6, double %3, i32 2
+  %8 = insertelement <4 x double> %7, double %4, i32 3
+  ret <4 x double> %8
+}
+
+define <2 x double> @v2f64_dup_const_vector_case1(double %a, double %b, double %c, double %d)
+; CHECK-LABEL: define <2 x double> @v2f64_dup_const_vector_case1(
+; CHECK-SAME: double [[A:%.*]], double [[B:%.*]], double [[C:%.*]], double [[D:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[B]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 2.100000e+01, double 2.200000e+01>
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[D]], i32 1
+; CHECK-NEXT:    [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], <double 2.100000e+01, double 2.200000e+01>
+; CHECK-NEXT:    [[TMP7:%.*]] = fadd <2 x double> [[TMP3]], [[TMP6]]
+; CHECK-NEXT:    ret <2 x double> [[TMP7]]
+;
+{
+  %1 = fmul double %a, 21.0
+  %2 = fmul double %b, 22.0
+  %3 = fmul double %c, 21.0
+  %4 = fmul double %d, 22.0
+  %5 = insertelement <2 x double> poison, double %1, i32 0
+  %6 = insertelement <2 x double> %5, double %2, i32 1
+  %7 = insertelement <2 x double> poison, double %3, i32 0
+  %8 = insertelement <2 x double> %7, double %4, i32 1
+  %9 = fadd <2 x double> %6, %8
+  ret <2 x double> %9
+}
+
+define <2 x double> @v2f64_dup_const_vector_case2(double %a, double %b, double %c, double %d)
+; CHECK-LABEL: define <2 x double> @v2f64_dup_const_vector_case2(
+; CHECK-SAME: double [[A:%.*]], double [[B:%.*]], double [[C:%.*]], double [[D:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x double> poison, double [[A]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[B]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 2.100000e+01, double 2.200000e+01>
+; CHECK-NEXT:    [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], <double 2.100000e+01, double 2.200000e+01>
+; CHECK-NEXT:    ret <2 x double> [[TMP4]]
+;
+{
+  %1 = fmul double %a, 21.0
+  %2 = fmul double %b, 22.0
+  %3 = fadd double %1, 21.0
+  %4 = fadd double %2, 22.0
+  %5 = insertelement <2 x double> poison, double %3, i32 0
+  %6 = insertelement <2 x double> %5, double %4, i32 1
+  ret <2 x double> %6
+}

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been a little sceptical about costing constants with how they are currently represented in llvm and how often they can be "free". They can certainly be more expensive at times, looking forward to being proved incorrect about it.

The tests LGTM.

@sushgokh sushgokh merged commit 197fb27 into llvm:main Nov 21, 2024
8 of 10 checks passed
@sushgokh sushgokh deleted the nfc-materializing-const-vect branch November 21, 2024 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants