-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Open
Labels
backend:X86good first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributemissed-optimization
Description
Vector multiplication by most 8-bit constants is currently implemented by a width extension to 16-bits:
multiplyBy10_clang:
movdqa xmm1, xmm0
punpckhbw xmm1, xmm1
movdqa xmm2, xmmword ptr [rip + .LCPI0_0]
pmullw xmm1, xmm2
movdqa xmm3, xmmword ptr [rip + .LCPI0_1]
pand xmm1, xmm3
punpcklbw xmm0, xmm0
pmullw xmm0, xmm2
pand xmm0, xmm3
packuswb xmm0, xmm1
retHowever, it is often more efficient to instead perform a short sequence of shift-and-adds both in terms of size and dependency length. For example, x * 10 = (x << 3) + (x << 1):
multiplyBy10_shiftAndAdd:
movdqa xmm1, xmm0
paddb xmm0, xmm0
psllw xmm1, 3
pand xmm1, xmmword ptr [rip + .LCPI0_0]
paddb xmm0, xmm1
retThis method is currently implemented, but only for constants that are almost powers of two. Notably, gcc always use this method (although its sequences are often non-optimal).
Metadata
Metadata
Assignees
Labels
backend:X86good first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributemissed-optimization