Skip to content

Commit 683beaa

Browse files
[AArch64][BOLT] Ensure tentative code layout for cold BBs runs.
When split functions is used, BOLT may skip tentative code layout estimation in some cases, like: - when there is no profile data for some blocks (ie cold blocks) - when there are cold functions in lite mode - when skip functions is used However, when rewriting the binary we still need to compute PC-relative distances between hot and cold basic blocks. Without cold layout estimation, BOLT uses '0x0' as the address of the first cold block, leading to incorrect estimations of any PC-relative addresses. This affects large binaries as the relaxStub method expands more branches than necessary using the short-jump sequence, at it wrongly believes it has exceeded the branch distance boundary. This increases code size with a larger and slower sequence; however, performance regression is expected to be minimal since this only affects called cold code. Example of such an unnecessary relaxation: from: ```armasm b .Ltmp1234 ``` to: ```armasm adrp x16, .Ltmp1234 add x16, x16, :lo12:.Ltmp1234 br x16 ```
1 parent 0df1465 commit 683beaa

File tree

2 files changed

+9
-6
lines changed

2 files changed

+9
-6
lines changed

bolt/lib/Passes/LongJmp.cpp

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
//===----------------------------------------------------------------------===//
1212

1313
#include "bolt/Passes/LongJmp.h"
14+
#include "bolt/Utils/CommandLineOpts.h"
1415

1516
#define DEBUG_TYPE "longjmp"
1617

@@ -324,7 +325,6 @@ uint64_t LongJmpPass::tentativeLayoutRelocColdPart(
324325
uint64_t LongJmpPass::tentativeLayoutRelocMode(
325326
const BinaryContext &BC, std::vector<BinaryFunction *> &SortedFunctions,
326327
uint64_t DotAddress) {
327-
328328
// Compute hot cold frontier
329329
uint32_t LastHotIndex = -1u;
330330
uint32_t CurrentIndex = 0;
@@ -354,9 +354,12 @@ uint64_t LongJmpPass::tentativeLayoutRelocMode(
354354
for (BinaryFunction *Func : SortedFunctions) {
355355
if (!BC.shouldEmit(*Func)) {
356356
HotAddresses[Func] = Func->getAddress();
357-
continue;
357+
// Don't perform any tentative address estimation of a function's cold
358+
// layout if it won't be emitted, unless we are ignoring a large number of
359+
// functions (ie, on lite mode) and we haven't done such estimation yet.
360+
if (opts::processAllFunctions() || ColdLayoutDone)
361+
continue;
358362
}
359-
360363
if (!ColdLayoutDone && CurrentIndex >= LastHotIndex) {
361364
DotAddress =
362365
tentativeLayoutRelocColdPart(BC, SortedFunctions, DotAddress);

bolt/test/AArch64/split-funcs-lite.test

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ REQUIRES: system-linux
88

99
RUN: %clang %cflags %p/../Inputs/asm_main.c -Wl,-q -o %t
1010

11-
RUN: not --crash llvm-bolt %t -o %t.bolt -lite=1 -split-functions -split-all-cold \
12-
RUN: --skip-funcs="main,foo" 2>&1 | FileCheck %s
11+
RUN: llvm-bolt %t -o %t.bolt -lite=1 -split-functions -split-all-cold \
12+
RUN: --skip-funcs="_init,_start,call_weak_fn/1,deregister_tm_clones/1,register_tm_clones/1,__do_global_dtors_aux/1,frame_dummy/1,main,foo,_fini" 2>&1 | FileCheck %s
1313

14-
CHECK: Did not perform tentative code layout for cold blocks.
14+
CHECK-NOT: Did not perform tentative code layout for cold blocks.

0 commit comments

Comments
 (0)