Skip to content

Commit 945c9d4

Browse files
xur-llvmmasahir0y
authored andcommitted
kbuild: Add Propeller configuration for kernel build
Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <[email protected]> Signed-off-by: Han Shen <[email protected]> Signed-off-by: Rong Xu <[email protected]> Suggested-by: Sriraman Tallam <[email protected]> Suggested-by: Krzysztof Pszeniczny <[email protected]> Suggested-by: Nick Desaulniers <[email protected]> Suggested-by: Stephane Eranian <[email protected]> Tested-by: Yonghong Song <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
1 parent d614b5a commit 945c9d4

File tree

11 files changed

+237
-3
lines changed

11 files changed

+237
-3
lines changed

Documentation/dev-tools/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Documentation/dev-tools/testing-overview.rst
3535
checkuapi
3636
gpio-sloppy-logic-analyzer
3737
autofdo
38+
propeller
3839

3940

4041
.. only:: subproject and html
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=====================================
4+
Using Propeller with the Linux kernel
5+
=====================================
6+
7+
This enables Propeller build support for the kernel when using Clang
8+
compiler. Propeller is a profile-guided optimization (PGO) method used
9+
to optimize binary executables. Like AutoFDO, it utilizes hardware
10+
sampling to gather information about the frequency of execution of
11+
different code paths within a binary. Unlike AutoFDO, this information
12+
is then used right before linking phase to optimize (among others)
13+
block layout within and across functions.
14+
15+
A few important notes about adopting Propeller optimization:
16+
17+
#. Although it can be used as a standalone optimization step, it is
18+
strongly recommended to apply Propeller on top of AutoFDO,
19+
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
20+
assumes this paradigm.
21+
22+
#. Propeller uses another round of profiling on top of
23+
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
24+
"build-afdo - train-afdo - build-propeller - train-propeller -
25+
build-optimized".
26+
27+
#. Propeller requires LLVM 19 release or later for Clang/Clang++
28+
and the linker(ld.lld).
29+
30+
#. In addition to LLVM toolchain, Propeller requires a profiling
31+
conversion tool: https://github.com/google/autofdo with a release
32+
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
33+
34+
The Propeller optimization process involves the following steps:
35+
36+
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
37+
you would normally do, but with a set of compile-time / link-time
38+
flags, so that a special metadata section is created within the
39+
kernel binary. The special section is only intend to be used by the
40+
profiling tool, it is not part of the runtime image, nor does it
41+
change kernel run time text sections.
42+
43+
#. Profiling: The above kernel is then run with a representative
44+
workload to gather execution frequency data. This data is collected
45+
using hardware sampling, via perf. Propeller is most effective on
46+
platforms supporting advanced PMU features like LBR on Intel
47+
machines. This step is the same as profiling the kernel for AutoFDO
48+
(the exact perf parameters can be different).
49+
50+
#. Propeller profile generation: Perf output file is converted to a
51+
pair of Propeller profiles via an offline tool.
52+
53+
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
54+
binary as you would normally do, but with a compile-time /
55+
link-time flag to pick up the Propeller compile time and link time
56+
profiles. This build step uses 3 profiles - the AutoFDO profile,
57+
the Propeller compile-time profile and the Propeller link-time
58+
profile.
59+
60+
#. Deployment: The optimized kernel binary is deployed and used
61+
in production environments, providing improved performance
62+
and reduced latency.
63+
64+
Preparation
65+
===========
66+
67+
Configure the kernel with::
68+
69+
CONFIG_AUTOFDO_CLANG=y
70+
CONFIG_PROPELLER_CLANG=y
71+
72+
Customization
73+
=============
74+
75+
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
76+
for Propeller builds. One can, however, enable or disable Propeller build
77+
for individual files and directories by adding a line similar to the
78+
following to the respective kernel Makefile:
79+
80+
- For enabling a single file (e.g. foo.o)::
81+
82+
PROPELLER_PROFILE_foo.o := y
83+
84+
- For enabling all files in one directory::
85+
86+
PROPELLER_PROFILE := y
87+
88+
- For disabling one file::
89+
90+
PROPELLER_PROFILE_foo.o := n
91+
92+
- For disabling all files in one directory::
93+
94+
PROPELLER__PROFILE := n
95+
96+
97+
Workflow
98+
========
99+
100+
Here is an example workflow for building an AutoFDO+Propeller kernel:
101+
102+
1) Assuming an AutoFDO profile is already collected following
103+
instructions in the AutoFDO document, build the kernel on the host
104+
machine, with AutoFDO and Propeller build configs ::
105+
106+
CONFIG_AUTOFDO_CLANG=y
107+
CONFIG_PROPELLER_CLANG=y
108+
109+
and ::
110+
111+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
112+
113+
2) Install the kernel on the test machine.
114+
115+
3) Run the load tests. The '-c' option in perf specifies the sample
116+
event period. We suggest using a suitable prime number, like 500009,
117+
for this purpose.
118+
119+
- For Intel platforms::
120+
121+
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
122+
123+
- For AMD platforms::
124+
125+
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
126+
127+
Note you can repeat the above steps to collect multiple <perf_file>s.
128+
129+
4) (Optional) Download the raw perf file(s) to the host machine.
130+
131+
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
132+
generate Propeller profile. ::
133+
134+
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
135+
--format=propeller --propeller_output_module_name
136+
--out=<propeller_profile_prefix>_cc_profile.txt
137+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
138+
139+
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
140+
141+
This command generates a pair of Propeller profiles:
142+
"<propeller_profile_prefix>_cc_profile.txt" and
143+
"<propeller_profile_prefix>_ld_profile.txt".
144+
145+
If there are more than 1 perf_file collected in the previous step,
146+
you can create a temp list file "<perf_file_list>" with each line
147+
containing one perf file name and run::
148+
149+
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
150+
--format=propeller --propeller_output_module_name
151+
--out=<propeller_profile_prefix>_cc_profile.txt
152+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
153+
154+
6) Rebuild the kernel using the AutoFDO and Propeller
155+
profiles. ::
156+
157+
CONFIG_AUTOFDO_CLANG=y
158+
CONFIG_PROPELLER_CLANG=y
159+
160+
and ::
161+
162+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

MAINTAINERS

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18503,6 +18503,13 @@ S: Maintained
1850318503
F: include/linux/psi*
1850418504
F: kernel/sched/psi.c
1850518505

18506+
PROPELLER BUILD
18507+
M: Rong Xu <[email protected]>
18508+
M: Han Shen <[email protected]>
18509+
S: Supported
18510+
F: Documentation/dev-tools/propeller.rst
18511+
F: scripts/Makefile.propeller
18512+
1850618513
PRINTK
1850718514
M: Petr Mladek <[email protected]>
1850818515
R: Steven Rostedt <[email protected]>

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1024,6 +1024,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
10241024
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
10251025
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
10261026
include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
1027+
include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller
10271028
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
10281029

10291030
include $(addprefix $(srctree)/, $(include-y))

arch/Kconfig

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -831,6 +831,25 @@ config AUTOFDO_CLANG
831831

832832
If unsure, say N.
833833

834+
config ARCH_SUPPORTS_PROPELLER_CLANG
835+
bool
836+
837+
config PROPELLER_CLANG
838+
bool "Enable Clang's Propeller build"
839+
depends on ARCH_SUPPORTS_PROPELLER_CLANG
840+
depends on CC_IS_CLANG && CLANG_VERSION >= 190000
841+
help
842+
This option enables Clang’s Propeller build. When the Propeller
843+
profiles is specified in variable CLANG_PROPELLER_PROFILE_PREFIX
844+
during the build process, Clang uses the profiles to optimize
845+
the kernel.
846+
847+
If no profile is specified, Propeller options are still passed
848+
to Clang to facilitate the collection of perf data for creating
849+
the Propeller profiles in subsequent builds.
850+
851+
If unsure, say N.
852+
834853
config ARCH_SUPPORTS_CFI_CLANG
835854
bool
836855
help

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ config X86
127127
select ARCH_SUPPORTS_LTO_CLANG_THIN
128128
select ARCH_SUPPORTS_RT
129129
select ARCH_SUPPORTS_AUTOFDO_CLANG
130+
select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
130131
select ARCH_USE_BUILTIN_BSWAP
131132
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
132133
select ARCH_USE_MEMTEST

arch/x86/kernel/vmlinux.lds.S

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,10 @@ SECTIONS
443443

444444
STABS_DEBUG
445445
DWARF_DEBUG
446+
#ifdef CONFIG_PROPELLER_CLANG
447+
.llvm_bb_addr_map : { *(.llvm_bb_addr_map) }
448+
#endif
449+
446450
ELF_DETAILS
447451

448452
DISCARDS

include/asm-generic/vmlinux.lds.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -95,14 +95,14 @@
9595
* With LTO_CLANG, the linker also splits sections by default, so we need
9696
* these macros to combine the sections during the final link.
9797
*
98-
* With AUTOFDO_CLANG, by default, the linker splits text sections and
99-
* regroups functions into subsections.
98+
* With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits
99+
* text sections and regroups functions into subsections.
100100
*
101101
* RODATA_MAIN is not used because existing code already defines .rodata.x
102102
* sections to be brought in with rodata.
103103
*/
104104
#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
105-
defined(CONFIG_AUTOFDO_CLANG)
105+
defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
106106
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
107107
#else
108108
#define TEXT_MAIN .text

scripts/Makefile.lib

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,16 @@ _c_flags += $(if $(patsubst n%,, \
201201
$(CFLAGS_AUTOFDO_CLANG))
202202
endif
203203

204+
#
205+
# Enable Propeller build flags except some files or directories we don't want to
206+
# enable (depends on variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE).
207+
#
208+
ifdef CONFIG_PROPELLER_CLANG
209+
_c_flags += $(if $(patsubst n%,, \
210+
$(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \
211+
$(CFLAGS_PROPELLER_CLANG))
212+
endif
213+
204214
# $(src) for including checkin headers from generated source files
205215
# $(obj) for including generated headers from checkin source files
206216
ifeq ($(KBUILD_EXTMOD),)

scripts/Makefile.propeller

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# SPDX-License-Identifier: GPL-2.0
2+
3+
# Enable available and selected Clang Propeller features.
4+
ifdef CLANG_PROPELLER_PROFILE_PREFIX
5+
CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections
6+
KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering
7+
else
8+
CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels
9+
endif
10+
11+
# Propeller requires debug information to embed module names in the profiles.
12+
# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO,
13+
# as the option should already be set.
14+
ifndef CONFIG_DEBUG_INFO
15+
ifndef CONFIG_AUTOFDO_CLANG
16+
CFLAGS_PROPELLER_CLANG += -gmlt
17+
endif
18+
endif
19+
20+
ifdef CONFIG_LTO_CLANG_THIN
21+
ifdef CLANG_PROPELLER_PROFILE_PREFIX
22+
KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt
23+
else
24+
KBUILD_LDFLAGS += --lto-basic-block-sections=labels
25+
endif
26+
endif
27+
28+
export CFLAGS_PROPELLER_CLANG

0 commit comments

Comments
 (0)