Skip to content

Performance flaw in tight loops in clang++ #152255

@rrrlasse

Description

@rrrlasse

In a tight while-loop iterating over a large buffer, clang++ inserts a series of addl instructions inside the loop that hurts performance alot.

This is a problem in many kinds of code that runs over large buffers, such as data compression, etc.

Assume we have this construction:

while (*(src + b) == *(buf + b) && src + b < last) {
        b++;
}
if ((b <= 6)) {
    ret = (b << 13);
}
else if (b <= 14) {
    ret = (b << 18);
}
else if (b <= 62) {
    ret = (b << 22);
}
else {
    ret = (b << 50);
}

The while-loop disassembles to this:

.LBB1_4:                                #   Parent Loop BB1_1 Depth=1
	leaq	(%rax,%r8), %rdx
	leaq	1(%r8), %rsi
	addl	$8192, %ebp                     # imm = 0x2000
	addl	$262144, %edi                   # imm = 0x40000
	addl	$4194304, %ecx                  # imm = 0x400000
	cmpq	$99999999, %rdx                 # imm = 0x5F5E0FF
	jg	.LBB1_6
# %bb.5:                                #   in Loop: Header=BB1_4 Depth=3
	movzbl	(%r8,%r12), %r9d
	movq	%rsi, %r8
	cmpb	%r9b, (%r12,%rdx)
	je	.LBB1_4
.LBB1_6:    

It looks like clang++ attempts to optimize for the later if-else statements by precomputing the left-shifted values of b.

If we place the while loop in a separate function we can avoid this and get following timings:

With addl: 43 milliseconds
Without addl: 23 milliseconds

GCC and Visual C++ don't have this problem and run at around 23 milliseconds in both cases.

Compile command line is: clang++ -O3 -std=c++20 demo.cpp.

Version:
Ubuntu clang version 20.1.2 (0ubuntu1)
Target: x86_64-pc-linux-gnu
Thread model: posix

Sample code that includes a benchmark: demo.txt

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions