[clang] Optimisation regression in trunk for x86-64 AVX2 shuffle

Looks like there's an optimisation regression between clang 21.1.0 and trunk in handling AVX2 shuffle operations. It seems to only manifest when shuffling the result of a comparison, (e.g. `_mm256_cmpeq_epi32`). clang now prefers to do the shuffling in packed 128 bit xmm registers and then reconstruct the 256 bit ymm output. 


Test case (Compiler Explorer: https://gcc.godbolt.org/z/xGGs8hW5P)
Compiled with `-O2 -mavx2`

```
#include <immintrin.h>

__m256i foo(__m256i a, __m256i b) {
    __m256i x = _mm256_cmpeq_epi32(a, b);
    return _mm256_shuffle_epi32(x, 0b11101111); 
}
```

Clang trunk `clang version 22.0.0git (https://github.com/llvm/llvm-project.git 03e66aeb96928592ee6cd51913bf72a6e21066fc)`

```
foo(long long vector[4], long long vector[4]):
  vpcmpeqd ymm0, ymm0, ymm1
  vextracti128 xmm1, ymm0, 1
  vpackssdw xmm0, xmm0, xmm1
  vpshuflw xmm0, xmm0, 239
  vpshufhw xmm0, xmm0, 239
  vpmovsxwd ymm0, xmm0
  ret
```

Clang 21.1.0:

```
foo(long long vector[4], long long vector[4]):
  vpcmpeqd ymm0, ymm0, ymm1
  vpshufd ymm0, ymm0, 239
  ret
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang] Optimisation regression in trunk for x86-64 AVX2 shuffle #165813

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[clang] Optimisation regression in trunk for x86-64 AVX2 shuffle #165813

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions