SLP Vectorizer fails to vectorize a horizontal pattern when it's repeated

|  |  |
| --- | --- |
| Bugzilla Link | [42070](https://llvm.org/bz42070) |
| Version | trunk |
| OS | All |
| CC | @RKSimon |

## Extended Description 
Seems to be loosely related to llvm/llvm-project#34796 
since the problematic pattern is an often result of loop unrolling.

For some reason, SLP vectorizer fails to vectorize a horizontal reduction pattern, when it's repeated, i.e. the following code:

float foo(float * __restrict x, float * __restrict y, unsigned len) {
    float acc = 0;
    acc += *x++ * *y++;
    acc += *x++ * *y++;
    x += 4; y += 4;
    acc += *x++ * *y++;
    acc += *x++ * *y++;
    return acc;
}

is compiled into:

define dso_local float @&#8203;foo(float* noalias nocapture readonly, float* noalias nocapture readonly, i32) local_unnamed_addr #&#8203;0 {
  %4 = getelementptr inbounds float, float* %0, i64 1
  %5 = load float, float* %0, align 4, !tbaa !&#8203;2
  %6 = getelementptr inbounds float, float* %1, i64 1
  %7 = load float, float* %1, align 4, !tbaa !&#8203;2
  %8 = fmul float %5, %7
  %9 = fadd float %8, 0.000000e+00
  %10 = load float, float* %4, align 4, !tbaa !&#8203;2
  %11 = load float, float* %6, align 4, !tbaa !&#8203;2
  %12 = fmul float %10, %11
  %13 = fadd float %9, %12
  %14 = getelementptr inbounds float, float* %0, i64 6
  %15 = getelementptr inbounds float, float* %1, i64 6
  %16 = bitcast float* %14 to <2 x float>*
  %17 = load <2 x float>, <2 x float>* %16, align 4, !tbaa !&#8203;2
  %18 = bitcast float* %15 to <2 x float>*
  %19 = load <2 x float>, <2 x float>* %18, align 4, !tbaa !&#8203;2
  %20 = fmul <2 x float> %17, %19
  %21 = extractelement <2 x float> %20, i32 0
  %22 = fadd float %13, %21
  %23 = extractelement <2 x float> %20, i32 1
  %24 = fadd float %22, %23
  ret float %24
}

Note, that only the second half (after x+=4;y+=4) was vectorized, while each of  them can be vectorized separately just fine. It looks like SLP vectorizer initially attempts to reduce all loads and adds, fails because of the middle increment and then never tries to vectorize the first half.

This can have a significant effect on performance in the presence of loop unrolling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SLP Vectorizer fails to vectorize a horizontal pattern when it's repeated #41415

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SLP Vectorizer fails to vectorize a horizontal pattern when it's repeated #41415

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions