-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[lld][BP] Avoid ordering ICF'ed sections #126327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CC @Colibrow |
|
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-macho Author: Ellis Hoag (ellishg) ChangesICF runs before BPSectionOrderer. When a section is ICF'ed, it seems that the original sections are marked as not live, but are still kept around. Prior to this patch, those ICF'ed sections would be passed to BP and ordered before being skipped when writing the output. Now, these sections are no longer passed to BP, saving runtime and possibly improving BP's output. In a large binary, I found that the number of sections ordered using BP decreased, while the number of duplicate sections drastically decreased as expected. Full diff: https://github.com/llvm/llvm-project/pull/126327.diff 3 Files Affected:
diff --git a/lld/MachO/BPSectionOrderer.cpp b/lld/MachO/BPSectionOrderer.cpp
index 689afd67712a417..f693e5e59f8faea 100644
--- a/lld/MachO/BPSectionOrderer.cpp
+++ b/lld/MachO/BPSectionOrderer.cpp
@@ -117,6 +117,8 @@ DenseMap<const InputSection *, int> lld::macho::runBalancedPartitioning(
auto *isec = subsec.isec;
if (!isec || isec->data.empty())
continue;
+ if (isa<ConcatInputSection>(isec) && !isec->isLive(0))
+ continue;
size_t idx = sections.size();
sections.emplace_back(isec);
for (auto *sym : BPOrdererMachO::getSymbols(*isec)) {
diff --git a/lld/test/ELF/bp-section-orderer.s b/lld/test/ELF/bp-section-orderer.s
index 2e18107c02ca306..6a61e7e8cef2b8d 100644
--- a/lld/test/ELF/bp-section-orderer.s
+++ b/lld/test/ELF/bp-section-orderer.s
@@ -1,3 +1,4 @@
+# NOTE: Code has been autogenerated by utils/update_test_body.py
# REQUIRES: aarch64
# RUN: rm -rf %t && split-file %s %t && cd %t
@@ -25,30 +26,33 @@
# RUN: ld.lld -o out.s a.o --irpgo-profile=a.profdata --bp-startup-sort=function
# RUN: llvm-nm -jn out.s | tr '\n' , | FileCheck %s --check-prefix=STARTUP
-# STARTUP: s5,s4,s3,s2,s1,A,B,C,F,E,D,_start,d4,d3,d2,d1,{{$}}
+# STARTUP: s5,s4,s3,s2,s1,A,B,C,F,E,D,merged1,merged2,_start,d4,d3,d2,d1,{{$}}
# RUN: ld.lld -o out.os a.o --irpgo-profile=a.profdata --bp-startup-sort=function --symbol-ordering-file a.txt
# RUN: llvm-nm -jn out.os | tr '\n' , | FileCheck %s --check-prefix=ORDER-STARTUP
-# ORDER-STARTUP: s2,s1,s5,s4,s3,A,F,E,D,B,C,_start,d3,d2,d4,d1,{{$}}
+# ORDER-STARTUP: s2,s1,s5,s4,s3,A,F,E,D,B,C,merged1,merged2,_start,d3,d2,d4,d1,{{$}}
# RUN: ld.lld -o out.cf a.o --verbose-bp-section-orderer --bp-compression-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-FUNC
+# RUN: ld.lld -o out.cf.icf a.o --verbose-bp-section-orderer --bp-compression-sort=function --icf=all 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-ICF-FUNC
# RUN: llvm-nm -jn out.cf | tr '\n' , | FileCheck %s --check-prefix=CFUNC
-# CFUNC: s5,s4,s3,s2,s1,F,C,E,D,B,A,_start,d4,d3,d2,d1,{{$}}
+# CFUNC: s5,s4,s3,s2,s1,A,F,merged1,merged2,C,E,D,B,_start,d4,d3,d2,d1,{{$}}
# RUN: ld.lld -o out.cd a.o --verbose-bp-section-orderer --bp-compression-sort=data 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-DATA
# RUN: llvm-nm -jn out.cd | tr '\n' , | FileCheck %s --check-prefix=CDATA
-# CDATA: s5,s3,s4,s2,s1,F,C,E,D,B,A,_start,d4,d1,d3,d2,{{$}}
+# CDATA: s5,s3,s4,s2,s1,F,C,E,D,B,A,merged1,merged2,_start,d4,d1,d3,d2,{{$}}
# RUN: ld.lld -o out.cb a.o --verbose-bp-section-orderer --bp-compression-sort=both 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
-# RUN: llvm-nm -jn out.cb | tr '\n' , | FileCheck %s --check-prefix=CDATA
+# RUN: llvm-nm -jn out.cb | tr '\n' , | FileCheck %s --check-prefix=CBOTH
+# CBOTH: s5,s3,s4,s2,s1,A,F,merged1,merged2,C,E,D,B,_start,d4,d1,d3,d2,{{$}}
# RUN: ld.lld -o out.cbs a.o --verbose-bp-section-orderer --bp-compression-sort=both --irpgo-profile=a.profdata --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
# RUN: llvm-nm -jn out.cbs | tr '\n' , | FileCheck %s --check-prefix=CBOTH-STARTUP
-# CBOTH-STARTUP: s5,s3,s4,s2,s1,A,B,C,F,E,D,_start,d4,d1,d3,d2,{{$}}
+# CBOTH-STARTUP: s5,s3,s4,s2,s1,A,B,C,F,E,D,merged1,merged2,_start,d4,d1,d3,d2,{{$}}
-# BP-COMPRESSION-FUNC: Ordered 7 sections using balanced partitioning
+# BP-COMPRESSION-FUNC: Ordered 9 sections using balanced partitioning
+# BP-COMPRESSION-ICF-FUNC: Ordered 8 sections using balanced partitioning
# BP-COMPRESSION-DATA: Ordered 9 sections using balanced partitioning
-# BP-COMPRESSION-BOTH: Ordered 16 sections using balanced partitioning
+# BP-COMPRESSION-BOTH: Ordered 18 sections using balanced partitioning
#--- a.proftext
:ir
@@ -125,6 +129,9 @@ int C(int a) { A(); return a + 2; }
int B(int a) { A(); return a + 1; }
void A() {}
+int merged1(int a) { return F(a + 101); }
+int merged2(int a) { return F(a + 101); }
+
int _start() { return 0; }
#--- gen
@@ -236,6 +243,44 @@ A: // @A
.Lfunc_end5:
.size A, .Lfunc_end5-A
// -- End function
+ .section .text.merged1,"ax",@progbits
+ .globl merged1 // -- Begin function merged1
+ .p2align 2
+ .type merged1,@function
+merged1: // @merged1
+// %bb.0: // %entry
+ sub sp, sp, #32
+ stp x29, x30, [sp, #16] // 16-byte Folded Spill
+ add x29, sp, #16
+ stur w0, [x29, #-4]
+ ldur w8, [x29, #-4]
+ add w0, w8, #101
+ bl F
+ ldp x29, x30, [sp, #16] // 16-byte Folded Reload
+ add sp, sp, #32
+ ret
+.Lfunc_end6:
+ .size merged1, .Lfunc_end6-merged1
+ // -- End function
+ .section .text.merged2,"ax",@progbits
+ .globl merged2 // -- Begin function merged2
+ .p2align 2
+ .type merged2,@function
+merged2: // @merged2
+// %bb.0: // %entry
+ sub sp, sp, #32
+ stp x29, x30, [sp, #16] // 16-byte Folded Spill
+ add x29, sp, #16
+ stur w0, [x29, #-4]
+ ldur w8, [x29, #-4]
+ add w0, w8, #101
+ bl F
+ ldp x29, x30, [sp, #16] // 16-byte Folded Reload
+ add sp, sp, #32
+ ret
+.Lfunc_end7:
+ .size merged2, .Lfunc_end7-merged2
+ // -- End function
.section .text._start,"ax",@progbits
.globl _start // -- Begin function _start
.p2align 2
@@ -244,8 +289,8 @@ _start: // @_start
// %bb.0: // %entry
mov w0, wzr
ret
-.Lfunc_end6:
- .size _start, .Lfunc_end6-_start
+.Lfunc_end8:
+ .size _start, .Lfunc_end8-_start
// -- End function
.type s5,@object // @s5
.section .rodata.s5,"a",@progbits
@@ -330,6 +375,7 @@ d1:
.section ".note.GNU-stack","",@progbits
.addrsig
+ .addrsig_sym F
.addrsig_sym C
.addrsig_sym B
.addrsig_sym A
diff --git a/lld/test/MachO/bp-section-orderer.s b/lld/test/MachO/bp-section-orderer.s
index 2eaff04bdc04700..f0ac1bd99f7227a 100644
--- a/lld/test/MachO/bp-section-orderer.s
+++ b/lld/test/MachO/bp-section-orderer.s
@@ -42,13 +42,15 @@
# RUN: %no-fatal-warnings-lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --compression-sort=both --irpgo-profile-sort=%t/a.profdata 2>&1 | FileCheck %s --check-prefix=COMPRESSION-BOTH
# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --bp-compression-sort=function 2>&1 | FileCheck %s --check-prefix=COMPRESSION-FUNC
+# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --bp-compression-sort=function --icf=all 2>&1 | FileCheck %s --check-prefix=COMPRESSION-ICF-FUNC
# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --bp-compression-sort=data 2>&1 | FileCheck %s --check-prefix=COMPRESSION-DATA
# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --bp-compression-sort=both 2>&1 | FileCheck %s --check-prefix=COMPRESSION-BOTH
# RUN: %lld -arch arm64 -lSystem -e _main -o %t/a.out %t/a.o --verbose-bp-section-orderer --bp-compression-sort=both --irpgo-profile=%t/a.profdata --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=COMPRESSION-BOTH
-# COMPRESSION-FUNC: Ordered 7 sections using balanced partitioning
+# COMPRESSION-FUNC: Ordered 9 sections using balanced partitioning
+# COMPRESSION-ICF-FUNC: Ordered 7 sections using balanced partitioning
# COMPRESSION-DATA: Ordered 7 sections using balanced partitioning
-# COMPRESSION-BOTH: Ordered 14 sections using balanced partitioning
+# COMPRESSION-BOTH: Ordered 16 sections using balanced partitioning
#--- a.s
.text
@@ -78,6 +80,12 @@ F:
add w0, w0, #3
bl l_C.__uniq.111111111111111111111111111111111111111.llvm.2222222222222222222
ret
+merged1:
+ add w0, w0, #101
+ ret
+merged2:
+ add w0, w0, #101
+ ret
.data
s1:
|
|
Thanks for updating the ELF test! If you want to update the ELF code, you could do the following. In ELF,
While writing this comment, I realize that the tests probably should be enhanced to test section garbage collection (dead stripping). // used is to suppress compiler garbage collection in ELF; retain is to suppress linker garbage collection; used is not needed for non-internal linkage symbols // used is for both compiler/linker GC in Mach-O; retain is ignored for Mach-O #define RETAIN [[gnu::used,gnu::retain]]
RETAIN void A() {}
RETAIN void B() {} |
| auto *isec = subsec.isec; | ||
| if (!isec || isec->data.empty()) | ||
| continue; | ||
| if (isa<ConcatInputSection>(isec) && !isec->isLive(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first offset, 0, is dead, is it guaranteed that other offsets are also dead? This might be the case for code, but I'm not sure about data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment here.
llvm-project/lld/MachO/InputSection.h
Lines 113 to 114 in 62ae876
| // ConcatInputSections are entirely live or dead, so the offset is irrelevant. | |
| bool isLive(uint64_t off) const override { return live; } |
I actually don't know if isec could be some other InputSection, but I didn't see any examples in my large test binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
off is for mergeable string and constant literals. It is unneeded for // ConcatInputSections are entirely live or dead, so the offset is irrelevant.
While I haven't verified, I suspect that we only need to pass ConcatInputSection to BP if Mach-O works like ELF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it probably should use ConcatInputSection instead of InputSection. I think I was following the orderfile implementation which uses InputSection. I think this change should be a separate PR, though.
Thanks for the useful info! I had assumed that ELF was skipping dead sections already because the test I added already passes. |
|
Friendly ping |
kyulee-com
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems that the original sections are marked as not live, but are still kept around. Prior to this patch, those ICF'ed sections would be passed to BP and ordered before being skipped when writing the output. Now, these sections are no longer passed to BP, saving runtime and possibly improving BP's output. In a large binary, I found that the number of sections ordered using BP decreased, while the number of duplicate sections drastically decreased as expected. ``` Functions for startup: 50755 -> 50520 Functions for compression: 165734 -> 105328 Duplicate functions: 1827231 -> 55230 ```
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems that the original sections are marked as not live, but are still kept around. Prior to this patch, those ICF'ed sections would be passed to BP and ordered before being skipped when writing the output. Now, these sections are no longer passed to BP, saving runtime and possibly improving BP's output. In a large binary, I found that the number of sections ordered using BP decreased, while the number of duplicate sections drastically decreased as expected. ``` Functions for startup: 50755 -> 50520 Functions for compression: 165734 -> 105328 Duplicate functions: 1827231 -> 55230 ```
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems that the original sections are marked as not live, but are still kept around. Prior to this patch, those ICF'ed sections would be passed to BP and ordered before being skipped when writing the output. Now, these sections are no longer passed to BP, saving runtime and possibly improving BP's output.
In a large binary, I found that the number of sections ordered using BP decreased, while the number of duplicate sections drastically decreased as expected.