You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,29 +1,28 @@
1
1
---
2
-
title: Optimizing Arm binaries and libraries with LLVM-BOLT and profile merging
2
+
title: Optimize Arm applications and shared libraries with BOLT
3
3
4
4
draft: true
5
5
cascade:
6
6
draft: true
7
7
8
8
minutes_to_complete: 30
9
9
10
-
who_is_this_for: Performance engineers, software developers working on Arm platforms who want to optimize both application binaries and shared libraries using LLVM-BOLT.
10
+
who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT.
11
11
12
12
learning_objectives:
13
-
- Instrument and optimize binaries for individual workload features using LLVM-BOLT.
13
+
- Instrument and optimize application binaries for individual workload features using BOLT.
14
14
- Collect separate BOLT profiles and merge them for comprehensive code coverage.
15
15
- Optimize shared libraries independently.
16
16
- Integrate optimized shared libraries into applications.
17
17
- Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios.
18
18
19
19
prerequisites:
20
-
- An Arm based system running Linux with BOLT and Linux Perf installed. The Linux kernel should be version 5.15 or later.
21
-
- (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
20
+
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance.
10
10
11
-
In this Learning Path, you'll learn how to:
12
-
- Collect and merge BOLT profiles from multiple workload features (e.g., read-only and write-only)
- Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack
15
-
16
-
While MySQL and sysbench are used as examples, this method applies to **any feature-rich application** that:
17
-
- Exhibits multiple runtime paths
18
-
- Uses dynamic libraries
19
-
- Requires full-stack binary optimization for performance-critical deployment
20
-
21
-
The workflow includes:
22
-
1. Profiling each workload feature separately
23
-
2. Profiling external libraries independently
24
-
3. Merging profiles for broader code coverage
25
-
4. Applying BOLT to each binary and library
26
-
5. Linking bolted libraries with the merged-profile binary
11
+
Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
12
+
13
+
You should use an Arm Linux system with at least 4 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible.
14
+
15
+
## What will I do in this Learning Path?
16
+
17
+
In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the applcation and two share libraries which are used by MySQL are also optimized using BOLT.
18
+
19
+
1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only
20
+
21
+
A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns.
22
+
23
+
2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`
24
+
25
+
This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack.
26
+
27
+
3. Merge profile data for broader code coverage
28
+
29
+
By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload.
30
+
31
+
4. Run BOLT on each binary application and library
32
+
33
+
With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack.
34
+
35
+
5. Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack
36
+
37
+
After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements.
38
+
39
+
40
+
## What are good applications for BOLT?
41
+
42
+
MySQL and sysbench are used as example applications, but you can use this method for **any feature-rich application** that:
43
+
44
+
1. Exhibits multiple runtime paths
45
+
46
+
Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage.
47
+
48
+
2. Uses dynamic libraries
49
+
50
+
Many modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application.
51
+
52
+
3. Requires full-stack binary optimization for performance-critical deployment
53
+
54
+
In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md
+3-8Lines changed: 3 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,11 @@ weight: 3
6
6
layout: learningpathall
7
7
---
8
8
9
-
In this step, you will instrument an application binary (such as `mysqld`) with BOLT to collect runtime profile data for a specific feature — for example, a **read-only workload**.
9
+
In this step, you will use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads.
10
10
11
-
The collected profile will later be merged with others and used to optimize the application's code layout.
11
+
The collected profiles will be merged with others and used to optimize the application's code layout.
12
12
13
-
### Step 1: Build or obtain the uninstrumented binary
13
+
### Build the uninstrumented binary
14
14
15
15
Make sure your application binary is:
16
16
@@ -26,8 +26,6 @@ readelf -s /path/to/mysqld | grep main
26
26
27
27
If the symbols are missing, rebuild the binary with debug info and no stripping.
28
28
29
-
---
30
-
31
29
### Step 2: Instrument the binary with BOLT
32
30
33
31
Use `llvm-bolt` to create an instrumented version of the binary:
@@ -84,6 +82,3 @@ ls -lh /path/to/profile-readonly.fdata
84
82
85
83
You should see a non-empty file. This file will later be merged with other profiles (e.g., for write-only traffic) to generate a complete merged profile.
0 commit comments