Skip to content

Commit 60c76b8

Browse files
Merge pull request #2128 from madeline-underwood/bolt
Bolt_JA to sign off
2 parents be9f8cb + 06fe4ac commit 60c76b8

File tree

7 files changed

+165
-128
lines changed

7 files changed

+165
-128
lines changed

content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,19 @@
11
---
22
title: Optimize Arm applications and shared libraries with BOLT
33

4-
draft: true
5-
cascade:
6-
draft: true
7-
84
minutes_to_complete: 30
95

10-
who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT.
6+
who_is_this_for: This is an advanced topic for performance engineers and software developers targeting Arm platforms who want to optimize application binaries and shared libraries using BOLT.
117

128
learning_objectives:
13-
- Instrument and optimize application binaries for individual workload features using BOLT.
14-
- Collect separate BOLT profiles and merge them for comprehensive code coverage.
15-
- Optimize shared libraries independently.
16-
- Integrate optimized shared libraries into applications.
17-
- Evaluate and compare application and library performance across baseline, isolated, and merged optimization scenarios.
9+
- Instrument and optimize application binaries for individual workload features using BOLT
10+
- Collect and merge separate BOLT profiles to improve code coverage
11+
- Optimize shared libraries independently of application binaries
12+
- Integrate optimized shared libraries into applications
13+
- Evaluate and compare performance across baseline, isolated, and merged optimization scenarios
1814

1915
prerequisites:
20-
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
16+
- An Arm-based Linux system with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed
2117

2218
author: Gayathri Narayana Yegna Narayanan
2319

Binary file not shown.
Lines changed: 28 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,61 @@
11
---
2-
title: BOLT overview
2+
title: Overview
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses Linux Perf data to re-order the executable code layout to reduce memory overhead and improve performance.
9+
## What is BOLT?
1010

11-
Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed.
11+
[BOLT](https://github.com/llvm/llvm-project/blob/main/bolt/README.md) is a post-link binary optimizer that uses uses profiling data from [Linux Perf](/install-guides/perf/) to identify frequently executed functions and basic blocks. Based on this data, BOLT reorders code to improve instruction cache locality, reduce branch mispredictions, and shorten critical execution paths.
1212

13-
You should use an Arm Linux system with at least 8 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible.
13+
This often results in faster startup times, lower CPU cycles per instruction (CPI), and improved throughput - especially for large, performance-sensitive applications like databases, web servers, or system daemons.
1414

15-
## What will I do in this Learning Path?
15+
{{% notice Note %}}
16+
BOLT complements compile-time optimizations like LTO (Link-Time Optimization) and PGO (Profile-Guided Optimization). It applies after linking, giving it visibility into the final binary layout, which traditional compiler optimizations do not.
17+
{{% /notice %}}
1618

17-
In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the application and two share libraries which are used by MySQL are also optimized using BOLT.
19+
Before you begin, ensure that you have the following installed:
1820

19-
Here is an outline of the steps:
21+
- [BOLT](/install-guides/bolt/)
22+
- [Linux Perf](/install-guides/perf/)
2023

21-
1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only
24+
You should use an Arm-based Linux system with at least 8 CPUs and 16 GB of RAM. This Learning Path was tested on Ubuntu 24.04, but other Linux distributions are also supported.
2225

23-
A read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns.
26+
## What will I do in this Learning Path?
2427

25-
2. Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`
28+
In this Learning Path, you'll learn how to use BOLT to optimize both applications and shared libraries. You'll walk through a real-world example using MySQL and two of its dependent libraries:
2629

27-
This means you can apply BOLT optimizations not just to your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack.
30+
- `libssl.so`
31+
- `libcrypto.so`
2832

29-
3. Merge profile data for broader code coverage
33+
You will:
3034

31-
By combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload.
35+
- **Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only** - a read-only workload typically involves operations that only retrieve or query data, such as running SELECT statements in a database without modifying any records. In contrast, a write-only workload focuses on operations that modify data, such as INSERT, UPDATE, or DELETE statements. Profiling both types ensures that the optimized binary performs well under different usage patterns.
3236

33-
4. Run BOLT on each binary application and library
37+
- **Independently optimize application binaries and external user-space libraries, such as `libssl.so` and `libcrypto.so`** - this means that you can apply BOLT optimizations to not just your main application, but also to shared libraries it depends on, resulting in a more comprehensive performance improvement across your entire stack.
3438

35-
With the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack.
39+
- **Merge profile data for broader code coverage** - by combining the profile data collected from different workloads and libraries, you create a single, comprehensive profile that represents a wide range of application behaviors. This merged profile allows BOLT to optimize code paths that are exercised under different scenarios, leading to better overall performance and coverage than optimizing for a single workload.
3640

37-
5. Link the final optimized binary with the separately optimized libraries to deploy a fully optimized runtime stack
41+
- **Run BOLT on each binary application and library** - with the merged profile, you apply BOLT optimizations separately to each binary and shared library. This step ensures that both your main application and its dependencies are optimized based on real-world usage patterns, resulting in a more efficient and responsive software stack.
3842

39-
After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements.
43+
- **Link the final optimized binary with the separately optimized libraries to deploy a fully optimized runtime stack** - after optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements.
4044

4145
## What is BOLT profile merging?
4246

43-
BOLT profile merging is the process of combining profiling from multiple runs into a single profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations.
44-
45-
![Why BOLT Profile Merging?](Bolt-merge.png)
46-
47-
## What are good applications for BOLT?
48-
49-
MySQL and Sysbench are used as example applications, but you can use this method for any feature-rich application that:
47+
BOLT profile merging combines profiling data from multiple runs into one unified profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations.
5048

51-
1. Exhibits multiple runtime paths
49+
![Diagram showing how BOLT profile merging combines multiple runtime profiles into a single optimized view#center](bolt-merge.png "Why BOLT profile merging improves optimization coverage")
5250

53-
Applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage.
51+
## What types of applications benefit from BOLT?
5452

55-
2. Uses dynamic libraries
53+
Although this Learning Path uses MySQL and Sysbench as examples, you can apply the same method to any feature-rich application that:
5654

57-
Most modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application.
55+
- **Exhibits multiple runtime paths** - applications often have different code paths depending on the workload or user actions. Optimizing for just one path can leave performance gains untapped in others. By profiling and merging data from various workloads, you ensure broader optimization coverage.
5856

59-
3. Requires full-stack binary optimization for performance-critical deployment
57+
- **Uses dynamic libraries** - most modern applications rely on shared libraries for functionality. + Optimizing shared libraries alongside the main binary ensures consistent performance across your stack.
6058

61-
In scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits.
59+
- **Requires full-stack binary optimization for performance-critical deployment** - in scenarios where every bit of performance matters, such as high-throughput servers or latency-sensitive applications, optimizing the entire binary stack can yield significant benefits.
6260

6361

0 commit comments

Comments
 (0)