-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
I was doing some tests/benchmarks regarding switch vs array look-ups and found this change in behavior from Clang 18.1.0 to Clang 19.1.0 (and current trunk):
- https://godbolt.org/z/o4TYYdr4e (original code)
- https://godbolt.org/z/1bh3KdPzY (more minimal reproduction, without the benchmark boilerplate)
Clang 18.1.0 optimizes that big switch as a constant lookup table:
lea rcx, [rip + .Lswitch.table.main]
.LBB0_12:
movzx edx, byte ptr [rbx + rax - 3]
xor edx, 128
mov rdx, qword ptr [rcx + 8*rdx]
inc byte ptr [rsp + rdx + 112]On the other hand, Clang 19.1.0 generates a separate label for each switch case, and every label feeds into a main one:
.LBB0_28:
lea rcx, [rsp + 665] ; case 1: return 553;
jmp .LBB0_283
.LBB0_29:
lea rcx, [rsp + 653] ; case 2: return 541;
jmp .LBB0_283
; ..............................
.LBB0_283:
inc byte ptr [rcx]
inc rbp
cmp rbp, 300000000
je .LBB0_20
movzx ecx, byte ptr [rbx + rbp]
movsxd rdx, dword ptr [rax + 4*rcx]
add rdx, rax
mov rcx, r13
jmp rdxThis can tank the performance, for example if the branch predictor can't accurately predict which label you're going to access on the current iteration. In my example I'm generating random indexes and with perf stat I'm seeing almost 300 million branch misses (one for each increment() invocation).
Assuming that this change isn't an intentional trade-off made for a benefit in some other usecases, then this is a regression.
On my machine, the results of running that binary (same source code as the one in Godbolt) compiled with Clang 18 vs Clang 19 are as follows:
clang 18 = elapsed: 254ms sum: 28928
clang 19 = elapsed: 2813ms sum: 29184
So the binary generated by Clang 19 is about 11 times slower.
NOTE: The issue seems to be related to inlining, because if I add __attribute__((noinline)) to the increment() function, then Clang 19 optimizes it with a lookup table, just like Clang 18, and the result is much faster than what I get with inlining allowed:
elapsed: 548ms sum: 28928