Skip to content

SLP: missed vectorization due to cost model (benchmark from SPECFP 2006) #51468

@davidbolvansky

Description

@davidbolvansky
Bugzilla Link 52126
Version trunk
OS Linux
CC @alexey-bataev,@fhahn,@RKSimon

Extended Description

433.milc

typedef struct {
double real;
double imag;
} complex;
typedef struct {
complex e[3][3];
} su3_matrix;

#define CSUM(a, b)
{
(a).real += (b).real;
(a).imag += (b).imag;
}
#define CMUL(a, b, c)
{
(c).real = (a).real * (b).real - (a).imag * (b).imag;
(c).imag = (a).real * (b).imag + (a).imag * (b).real;
}

void mult_su3_nn2(su3_matrix *a, su3_matrix *b, su3_matrix *c) {
int i, j, k;
complex x, y;
for (i = 0; i < 3; i++)
for (j = 0; j < 3; j++) {
x.real = x.imag = 0.0;
for (k = 0; k < 3; k++) {
CMUL(a->e[i][k], b->e[k][j], y);
CSUM(x, y);
}
c->e[i][j] = x;
}
}

Flags: -Ofast -mavx
https://godbolt.org/z/b6nq5sKaW

example.cpp:10:5: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-missed=loop-vectorize]
for(i=0;i<3;i++)
^
example.cpp:10:5: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-missed=loop-vectorize]

GCC/ICC vectorizes mult_su3_nn_unrolled. LLVM does not vectorize it with AVX/AVX, only with -AVX512 (but does not use vaddsubpd as GCC and ICC do; GCC recently added pattern detection for addsub to SLP vectorizer)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions