x86 missing optimization for variable shift left (without avx512)

Given the following code
```
define <16 x i16> @mulbyconst(<16 x i16> %"a") #0 {
top:
  %0 = mul <16 x i16> %"a", <i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4, i16 8, i16 4>
  ret <16 x i16> %0
}
```
LLVM compiles this to a single `vpsllvw` instruction with AVX512, but in the absence of AVX512, it instead compiles to two `vpsllw` and a `vpblendw` (as shown in https://godbolt.org/z/PMehWerEd).

The issue is that although avx2 CPUs are missing the `vpsllvw` instruction (because avx2 is a bit of a mess), it includes the `vpmullw` instruction, so this could have compiled to a single `vpmullw` instruction by an alternating vector of `256` and `16`. This missed optimization is especially annoying because LLVM went through a bunch of work to canonicalize the variable multiplication by powers of 2 into a variable shift left, even though just leaving it as a multiply would have been more efficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

x86 missing optimization for variable shift left (without avx512) #140418

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

x86 missing optimization for variable shift left (without avx512) #140418

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions