Skip to content

Commit 315ad87

Browse files
xur-llvmmasahir0y
authored andcommitted
kbuild: Add AutoFDO support for Clang build
Add the build support for using Clang's AutoFDO. Building the kernel with AutoFDO does not reduce the optimization level from the compiler. AutoFDO uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. Experiments showed that the kernel can improve up to 10% in latency. The support requires a Clang compiler after LLVM 17. This submission is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, and BRBE on ARM 1 is part of planned future work. Here is an example workflow for AutoFDO kernel: 1) Build the kernel on the host machine with LLVM enabled, for example, $ make menuconfig LLVM=1 Turn on AutoFDO build config: CONFIG_AUTOFDO_CLANG=y With a configuration that has LLVM enabled, use the following command: scripts/config -e AUTOFDO_CLANG After getting the config, build with $ make LLVM=1 2) Install the kernel on the test machine. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 For Zen3: $ cat proc/cpuinfo | grep " brs" For Zen4: $ cat proc/cpuinfo | grep amd_lbr_v2 $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) To generate an AutoFDO profile, two offline tools are available: create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part of the AutoFDO project and can be found on GitHub (https://github.com/google/autofdo), version v0.30.1 or later. The llvm_profgen tool is included in the LLVM compiler itself. It's important to note that the version of llvm_profgen doesn't need to match the version of Clang. It needs to be the LLVM 19 release or later, or from the LLVM trunk. $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \ -o <profile_file> or $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=extbinary --out=<profile_file> Note that multiple AutoFDO profile files can be merged into one via: $ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n> 6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> Co-developed-by: Han Shen <[email protected]> Signed-off-by: Han Shen <[email protected]> Signed-off-by: Rong Xu <[email protected]> Suggested-by: Sriraman Tallam <[email protected]> Suggested-by: Krzysztof Pszeniczny <[email protected]> Suggested-by: Nick Desaulniers <[email protected]> Suggested-by: Stephane Eranian <[email protected]> Tested-by: Yonghong Song <[email protected]> Tested-by: Yabin Cui <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Reviewed-by: Kees Cook <[email protected]> Tested-by: Peter Jung <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
1 parent 397a479 commit 315ad87

File tree

9 files changed

+231
-0
lines changed

9 files changed

+231
-0
lines changed
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Using AutoFDO with the Linux kernel
5+
===================================
6+
7+
This enables AutoFDO build support for the kernel when using
8+
the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
9+
is a type of profile-guided optimization (PGO) used to enhance the
10+
performance of binary executables. It gathers information about the
11+
frequency of execution of various code paths within a binary using
12+
hardware sampling. This data is then used to guide the compiler's
13+
optimization decisions, resulting in a more efficient binary. AutoFDO
14+
is a powerful optimization technique, and data indicates that it can
15+
significantly improve kernel performance. It's especially beneficial
16+
for workloads affected by front-end stalls.
17+
18+
For AutoFDO builds, unlike non-FDO builds, the user must supply a
19+
profile. Acquiring an AutoFDO profile can be done in several ways.
20+
AutoFDO profiles are created by converting hardware sampling using
21+
the "perf" tool. It is crucial that the workload used to create these
22+
perf files is representative; they must exhibit runtime
23+
characteristics similar to the workloads that are intended to be
24+
optimized. Failure to do so will result in the compiler optimizing
25+
for the wrong objective.
26+
27+
The AutoFDO profile often encapsulates the program's behavior. If the
28+
performance-critical codes are architecture-independent, the profile
29+
can be applied across platforms to achieve performance gains. For
30+
instance, using the profile generated on Intel architecture to build
31+
a kernel for AMD architecture can also yield performance improvements.
32+
33+
There are two methods for acquiring a representative profile:
34+
(1) Sample real workloads using a production environment.
35+
(2) Generate the profile using a representative load test.
36+
When enabling the AutoFDO build configuration without providing an
37+
AutoFDO profile, the compiler only modifies the dwarf information in
38+
the kernel without impacting runtime performance. It's advisable to
39+
use a kernel binary built with the same AutoFDO configuration to
40+
collect the perf profile. While it's possible to use a kernel built
41+
with different options, it may result in inferior performance.
42+
43+
One can collect profiles using AutoFDO build for the previous kernel.
44+
AutoFDO employs relative line numbers to match the profiles, offering
45+
some tolerance for source changes. This mode is commonly used in a
46+
production environment for profile collection.
47+
48+
In a profile collection based on a load test, the AutoFDO collection
49+
process consists of the following steps:
50+
51+
#. Initial build: The kernel is built with AutoFDO options
52+
without a profile.
53+
54+
#. Profiling: The above kernel is then run with a representative
55+
workload to gather execution frequency data. This data is
56+
collected using hardware sampling, via perf. AutoFDO is most
57+
effective on platforms supporting advanced PMU features like
58+
LBR on Intel machines.
59+
60+
#. AutoFDO profile generation: Perf output file is converted to
61+
the AutoFDO profile via offline tools.
62+
63+
The support requires a Clang compiler LLVM 17 or later.
64+
65+
Preparation
66+
===========
67+
68+
Configure the kernel with::
69+
70+
CONFIG_AUTOFDO_CLANG=y
71+
72+
Customization
73+
=============
74+
75+
The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
76+
AutoFDO builds. One can, however, enable or disable AutoFDO build for
77+
individual files and directories by adding a line similar to the following
78+
to the respective kernel Makefile:
79+
80+
- For enabling a single file (e.g. foo.o) ::
81+
82+
AUTOFDO_PROFILE_foo.o := y
83+
84+
- For enabling all files in one directory ::
85+
86+
AUTOFDO_PROFILE := y
87+
88+
- For disabling one file ::
89+
90+
AUTOFDO_PROFILE_foo.o := n
91+
92+
- For disabling all files in one directory ::
93+
94+
AUTOFDO_PROFILE := n
95+
96+
Workflow
97+
========
98+
99+
Here is an example workflow for AutoFDO kernel:
100+
101+
1) Build the kernel on the host machine with LLVM enabled,
102+
for example, ::
103+
104+
$ make menuconfig LLVM=1
105+
106+
Turn on AutoFDO build config::
107+
108+
CONFIG_AUTOFDO_CLANG=y
109+
110+
With a configuration that with LLVM enabled, use the following command::
111+
112+
$ scripts/config -e AUTOFDO_CLANG
113+
114+
After getting the config, build with ::
115+
116+
$ make LLVM=1
117+
118+
2) Install the kernel on the test machine.
119+
120+
3) Run the load tests. The '-c' option in perf specifies the sample
121+
event period. We suggest using a suitable prime number, like 500009,
122+
for this purpose.
123+
124+
- For Intel platforms::
125+
126+
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
127+
128+
- For AMD platforms:
129+
130+
The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
131+
132+
For Zen3::
133+
134+
$ cat proc/cpuinfo | grep " brs"
135+
136+
For Zen4::
137+
138+
$ cat proc/cpuinfo | grep amd_lbr_v2
139+
140+
The following command generated the perf data file::
141+
142+
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
143+
144+
4) (Optional) Download the raw perf file to the host machine.
145+
146+
5) To generate an AutoFDO profile, two offline tools are available:
147+
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
148+
of the AutoFDO project and can be found on GitHub
149+
(https://github.com/google/autofdo), version v0.30.1 or later.
150+
The llvm_profgen tool is included in the LLVM compiler itself. It's
151+
important to note that the version of llvm_profgen doesn't need to match
152+
the version of Clang. It needs to be the LLVM 19 release of Clang
153+
or later, or just from the LLVM trunk. ::
154+
155+
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
156+
157+
or ::
158+
159+
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
160+
161+
Note that multiple AutoFDO profile files can be merged into one via::
162+
163+
$ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
164+
165+
6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
166+
(Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
167+
168+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>

Documentation/dev-tools/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Documentation/dev-tools/testing-overview.rst
3434
ktap
3535
checkuapi
3636
gpio-sloppy-logic-analyzer
37+
autofdo
3738

3839

3940
.. only:: subproject and html

MAINTAINERS

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3665,6 +3665,13 @@ F: kernel/audit*
36653665
F: lib/*audit.c
36663666
K: \baudit_[a-z_0-9]\+\b
36673667

3668+
AUTOFDO BUILD
3669+
M: Rong Xu <[email protected]>
3670+
M: Han Shen <[email protected]>
3671+
S: Supported
3672+
F: Documentation/dev-tools/autofdo.rst
3673+
F: scripts/Makefile.autofdo
3674+
36683675
AUXILIARY BUS DRIVER
36693676
M: Greg Kroah-Hartman <[email protected]>
36703677
R: Dave Ertman <[email protected]>

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1023,6 +1023,7 @@ include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
10231023
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
10241024
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
10251025
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
1026+
include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
10261027
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
10271028

10281029
include $(addprefix $(srctree)/, $(include-y))

arch/Kconfig

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -811,6 +811,26 @@ config LTO_CLANG_THIN
811811
If unsure, say Y.
812812
endchoice
813813

814+
config ARCH_SUPPORTS_AUTOFDO_CLANG
815+
bool
816+
817+
config AUTOFDO_CLANG
818+
bool "Enable Clang's AutoFDO build (EXPERIMENTAL)"
819+
depends on ARCH_SUPPORTS_AUTOFDO_CLANG
820+
depends on CC_IS_CLANG && CLANG_VERSION >= 170000
821+
help
822+
This option enables Clang’s AutoFDO build. When
823+
an AutoFDO profile is specified in variable
824+
CLANG_AUTOFDO_PROFILE during the build process,
825+
Clang uses the profile to optimize the kernel.
826+
827+
If no profile is specified, AutoFDO options are
828+
still passed to Clang to facilitate the collection
829+
of perf data for creating an AutoFDO profile in
830+
subsequent builds.
831+
832+
If unsure, say N.
833+
814834
config ARCH_SUPPORTS_CFI_CLANG
815835
bool
816836
help

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ config X86
126126
select ARCH_SUPPORTS_LTO_CLANG
127127
select ARCH_SUPPORTS_LTO_CLANG_THIN
128128
select ARCH_SUPPORTS_RT
129+
select ARCH_SUPPORTS_AUTOFDO_CLANG
129130
select ARCH_USE_BUILTIN_BSWAP
130131
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
131132
select ARCH_USE_MEMTEST

scripts/Makefile.autofdo

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# SPDX-License-Identifier: GPL-2.0
2+
3+
# Enable available and selected Clang AutoFDO features.
4+
5+
CFLAGS_AUTOFDO_CLANG := -fdebug-info-for-profiling -mllvm -enable-fs-discriminator=true -mllvm -improved-fs-discriminator=true
6+
7+
ifndef CONFIG_DEBUG_INFO
8+
CFLAGS_AUTOFDO_CLANG += -gmlt
9+
endif
10+
11+
ifdef CLANG_AUTOFDO_PROFILE
12+
CFLAGS_AUTOFDO_CLANG += -fprofile-sample-use=$(CLANG_AUTOFDO_PROFILE)
13+
endif
14+
15+
ifdef CONFIG_LTO_CLANG_THIN
16+
ifdef CLANG_AUTOFDO_PROFILE
17+
KBUILD_LDFLAGS += --lto-sample-profile=$(CLANG_AUTOFDO_PROFILE)
18+
endif
19+
KBUILD_LDFLAGS += --mllvm=-enable-fs-discriminator=true --mllvm=-improved-fs-discriminator=true -plugin-opt=thinlto
20+
endif
21+
22+
export CFLAGS_AUTOFDO_CLANG

scripts/Makefile.lib

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,16 @@ _c_flags += $(if $(patsubst n%,, \
191191
-D__KCSAN_INSTRUMENT_BARRIERS__)
192192
endif
193193

194+
#
195+
# Enable AutoFDO build flags except some files or directories we don't want to
196+
# enable (depends on variables AUTOFDO_PROFILE_obj.o and AUTOFDO_PROFILE).
197+
#
198+
ifeq ($(CONFIG_AUTOFDO_CLANG),y)
199+
_c_flags += $(if $(patsubst n%,, \
200+
$(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(is-kernel-object)), \
201+
$(CFLAGS_AUTOFDO_CLANG))
202+
endif
203+
194204
# $(src) for including checkin headers from generated source files
195205
# $(obj) for including generated headers from checkin source files
196206
ifeq ($(KBUILD_EXTMOD),)

tools/objtool/check.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4557,6 +4557,7 @@ static int validate_ibt(struct objtool_file *file)
45574557
!strcmp(sec->name, "__jump_table") ||
45584558
!strcmp(sec->name, "__mcount_loc") ||
45594559
!strcmp(sec->name, ".kcfi_traps") ||
4560+
!strcmp(sec->name, ".llvm.call-graph-profile") ||
45604561
strstr(sec->name, "__patchable_function_entries"))
45614562
continue;
45624563

0 commit comments

Comments
 (0)