Skip to content

Commit f15479c

Browse files
Merge pull request #2039 from jasonrandrews/review
tech review of BOLT Learning Path in progress
2 parents 2fb0e2a + d70b2b5 commit f15479c

File tree

3 files changed

+54
-31
lines changed

3 files changed

+54
-31
lines changed

content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,28 @@
11
---
2-
title: Optimizing Arm binaries and libraries with LLVM-BOLT and profile merging
2+
title: Optimize Arm applications and shared libraries with BOLT
33

44
draft: true
55
cascade:
66
draft: true
77

88
minutes_to_complete: 30
99

10-
who_is_this_for: Performance engineers, software developers working on Arm platforms who want to optimize both application binaries and shared libraries using LLVM-BOLT.
10+
who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT.
1111

1212
learning_objectives:
13-
- Instrument and optimize binaries for individual workload features using LLVM-BOLT.
13+
- Instrument and optimize application binaries for individual workload features using BOLT.
1414
- Collect separate BOLT profiles and merge them for comprehensive code coverage.
1515
- Optimize shared libraries independently.
1616
- Integrate optimized shared libraries into applications.
1717
- Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios.
1818

1919
prerequisites:
20-
- An Arm based system running Linux with BOLT and Linux Perf installed. The Linux kernel should be version 5.15 or later.
21-
- (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
20+
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
2221

2322
author: Gayathri Narayana Yegna Narayanan
2423

2524
### Tags
26-
skilllevels: Introductory
25+
skilllevels: Advanced
2726
subjects: Performance and Architecture
2827
armips:
2928
- Neoverse
Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Overview of BOLT Merge
2+
title: BOLT overview
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -8,20 +8,49 @@ layout: learningpathall
88

99
[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance.
1010

11-
In this Learning Path, you'll learn how to:
12-
- Collect and merge BOLT profiles from multiple workload features (e.g., read-only and write-only)
13-
- Independently optimize application binaries and external user-space libraries (e.g., `libssl.so`, `libcrypto.so`)
14-
- Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack
15-
16-
While MySQL and sysbench are used as examples, this method applies to **any feature-rich application** that:
17-
- Exhibits multiple runtime paths
18-
- Uses dynamic libraries
19-
- Requires full-stack binary optimization for performance-critical deployment
20-
21-
The workflow includes:
22-
1. Profiling each workload feature separately
23-
2. Profiling external libraries independently
24-
3. Merging profiles for broader code coverage
25-
4. Applying BOLT to each binary and library
26-
5. Linking bolted libraries with the merged-profile binary
11+
Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
12+
13+
You should use an Arm Linux system with at least 4 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible.
14+
15+
## What will I do in this Learning Path?
16+
17+
In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the applcation and two share libraries which are used by MySQL are also optimized using BOLT.
18+
19+
1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only
20+
21+
A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns.
22+
23+
2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`
24+
25+
This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack.
26+
27+
3. Merge profile data for broader code coverage
28+
29+
By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload.
30+
31+
4. Run BOLT on each binary application and library
32+
33+
With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack.
34+
35+
5. Link the final optimized binary with the separately bolted libraries to deploy a fully optimized runtime stack
36+
37+
After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements.
38+
39+
40+
## What are good applications for BOLT?
41+
42+
MySQL and sysbench are used as example applications, but you can use this method for **any feature-rich application** that:
43+
44+
1. Exhibits multiple runtime paths
45+
46+
Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage.
47+
48+
2. Uses dynamic libraries
49+
50+
Many modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application.
51+
52+
3. Requires full-stack binary optimization for performance-critical deployment
53+
54+
In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits.
55+
2756

content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
In this step, you will instrument an application binary (such as `mysqld`) with BOLT to collect runtime profile data for a specific feature — for example, a **read-only workload**.
9+
In this step, you will use BOLT to instrument the MySQL application binary and to collect profile data for specific workloads.
1010

11-
The collected profile will later be merged with others and used to optimize the application's code layout.
11+
The collected profiles will be merged with others and used to optimize the application's code layout.
1212

13-
### Step 1: Build or obtain the uninstrumented binary
13+
### Build the uninstrumented binary
1414

1515
Make sure your application binary is:
1616

@@ -26,8 +26,6 @@ readelf -s /path/to/mysqld | grep main
2626

2727
If the symbols are missing, rebuild the binary with debug info and no stripping.
2828

29-
---
30-
3129
### Step 2: Instrument the binary with BOLT
3230

3331
Use `llvm-bolt` to create an instrumented version of the binary:
@@ -84,6 +82,3 @@ ls -lh /path/to/profile-readonly.fdata
8482

8583
You should see a non-empty file. This file will later be merged with other profiles (e.g., for write-only traffic) to generate a complete merged profile.
8684

87-
---
88-
89-

0 commit comments

Comments
 (0)