-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Closed
Labels
llvm:optimizationsquestionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Description
Minimal example:
#include <memory.h>
volatile char bytes[16];
int main(void) {
for (int i = 0; i < 500; i++) {
memset((void*)bytes, 0, 16);
}
}https://godbolt.org/z/5s9M5MvhK
options: -O3 -fno-unroll-loops
main:
mov eax, 500
xorps xmm0, xmm0
.LBB0_1:
movaps xmmword ptr [rip + bytes], xmm0
dec eax
jne .LBB0_1
xor eax, eax
ret
bytes:
.zero 16options: -O3
main:
mov eax, 500
xorps xmm0, xmm0
.LBB0_1:
movaps xmmword ptr [rip + bytes], xmm0
add eax, -10
jne .LBB0_1
xor eax, eax
ret
bytes:
.zero 16When compiled with -O3 -fno-unroll-loops, the generated assembly instructions is as expected. But with -O3 it seems to unroll the loop without copying the body of the loop 10 times, causing the vector instruction (the vectorised memset) to be executed 10x less than expected.
Metadata
Metadata
Assignees
Labels
llvm:optimizationsquestionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!