Skip to content

Commit 109b69d

Browse files
Merge pull request #2062 from jasonrandrews/review
Review BOLT merging Learning Path
2 parents 9c04723 + 5f294d1 commit 109b69d

File tree

6 files changed

+66
-46
lines changed

6 files changed

+66
-46
lines changed

content/install-guides/bolt.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -145,19 +145,19 @@ You are now ready to [verify BOLT is installed](#verify).
145145
For Arm Linux use the file with `aarch64` in the name:
146146

147147
```bash
148-
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.5/clang+llvm-17.0.5-aarch64-linux-gnu.tar.xz
148+
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-19.1.7/clang+llvm-19.1.7-aarch64-linux-gnu.tar.xz
149149
```
150150

151151
2. Extract the downloaded file
152152

153153
```bash
154-
tar -xvf clang+llvm-17.0.5-aarch64-linux-gnu.tar.xz
154+
tar -xvf clang+llvm-19.1.7-aarch64-linux-gnu.tar.xz
155155
```
156156

157157
3. Add the path to BOLT in your `.bashrc` file
158158

159159
```bash
160-
echo 'export PATH="$PATH:$HOME/clang+llvm-17.0.5-aarch64-linux-gnu/bin"' >> ~/.bashrc
160+
echo 'export PATH="$PATH:$HOME/clang+llvm-19.1.7-aarch64-linux-gnu/bin"' >> ~/.bashrc
161161
source ~/.bashrc
162162
```
163163

@@ -201,9 +201,8 @@ The output is similar to:
201201

202202
```output
203203
LLVM (http://llvm.org/):
204-
LLVM version 18.0.0git
204+
LLVM version 19.1.7
205205
Optimized build with assertions.
206-
BOLT revision 99c15eb49ba0b607314b3bd221f0760049130d97
207206
208207
Registered Targets:
209208
aarch64 - AArch64 (little endian)

content/learning-paths/servers-and-cloud-computing/bolt-merge/_index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
---
22
title: Optimize Arm applications and shared libraries with BOLT
33

4+
draft: true
5+
cascade:
6+
draft: true
7+
48
minutes_to_complete: 30
59

610
who_is_this_for: Performance engineers and software developers working on Arm platforms who want to optimize both application binaries and shared libraries using BOLT.

content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,31 +18,32 @@ Install the required dependencies:
1818

1919
```bash
2020
sudo apt update
21-
sudo apt install -y build-essential cmake libncurses5-dev libssl-dev libboost-all-dev bison pkg-config libaio-dev libtirpc-dev git
21+
sudo apt install -y build-essential cmake libncurses5-dev libssl-dev libboost-all-dev \
22+
bison pkg-config libaio-dev libtirpc-dev git ninja-build liblz4-dev
2223
```
2324

2425
Download the MySQL source code. You can change to another version in the `checkout` command below if needed.
2526

2627
```bash
2728
git clone https://github.com/mysql/mysql-server.git
2829
cd mysql-server
29-
git checkout mysql-8.4.5
30+
git checkout mysql-8.0.37
3031
```
3132

3233
Configure the build for debug:
3334

3435
```bash
3536
mkdir build && cd build
36-
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_DEBUG=1 -DCMAKE_C_FLAGS="-fno-omit-frame-pointer" \
37-
-DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer" -DCMAKE_POSITION_INDEPENDENT_CODE=OFF \
38-
-DCMAKE_EXE_LINKER_FLAGS="-Wl,--emit-relocs" \
39-
-DCMAKE_EXE_LINKER_FLAGS="-no-pie"
37+
cmake .. -DCMAKE_C_FLAGS="-O3 -mcpu=neoverse-n2 -Wno-enum-constexpr-conversion -fno-reorder-blocks-and-partition" \
38+
-DCMAKE_CXX_FLAGS="-O3 -mcpu=neoverse-n2 -Wno-enum-constexpr-conversion -fno-reorder-blocks-and-partition" \
39+
-DCMAKE_CXX_LINK_FLAGS="-Wl,--emit-relocs" -DCMAKE_C_LINK_FLAGS="-Wl,--emit-relocs" -G Ninja \
40+
-DWITH_BOOST=$HOME/boost -DDOWNLOAD_BOOST=On -DWITH_ZLIB=bundled -DWITH_LZ4=system -DWITH_SSL=system
4041
```
4142

42-
Build mysqld:
43+
Build MySQL:
4344

4445
```bash
45-
make -j$(nproc)
46+
ninja
4647
```
4748

4849
After the build completes, the `mysqld` binary is located at `$HOME/mysql-server/build/runtime_output_directory/mysqld`
@@ -54,11 +55,16 @@ You can run `mysqld` directly from the build directory as shown, or run `make in
5455
After building mysqld, install MySQL server and client utilities system-wide:
5556

5657
```bash
57-
sudo make install
58+
sudo ninja install
5859
```
5960

6061
This will make the `mysql` client and other utilities available in your PATH.
6162

63+
```bash
64+
echo 'export PATH="$PATH:/usr/local/mysql/bin"' >> ~/.bashrc
65+
source ~/.bashrc
66+
```
67+
6268
Ensure the binary is unstripped and includes debug symbols for BOLT instrumentation.
6369

6470
To work with BOLT, your application binary should be:
@@ -125,6 +131,8 @@ taskset -c 6 $HOME/mysql-server/build/runtime_output_directory/mysqld.instrument
125131

126132
Adjust `--datadir`, `--socket`, and `--port` as needed for your environment. Make sure the server is running and accessible before proceeding.
127133

134+
With the database running, open a second terminal to run the client commands.
135+
128136
## Install sysbench
129137

130138
You will need sysbench to generate workloads for MySQL. On most Arm Linux distributions, you can install it using your package manager:
@@ -156,6 +164,22 @@ EXIT;
156164

157165
## Run the instrumented binary under a feature-specific workload
158166

167+
Run `sysbench` with the `prepare` option:
168+
169+
```bash
170+
sysbench \
171+
--db-driver=mysql \
172+
--mysql-host=127.0.0.1 \
173+
--mysql-db=bench \
174+
--mysql-user=bench \
175+
--mysql-password=bench \
176+
--mysql-port=3306 \
177+
--tables=8 \
178+
--table-size=10000 \
179+
--threads=1 \
180+
/usr/share/sysbench/oltp_read_only.lua prepare
181+
```
182+
159183
Use a workload generator to stress the binary in a feature-specific way. For example, to simulate **read-only traffic** with sysbench:
160184

161185
```bash
@@ -176,7 +200,6 @@ taskset -c 7 sysbench \
176200
On an 8-core system, cores are numbered 0-7. Adjust the `taskset -c` values as needed for your system. Avoid using the same core for both mysqld and sysbench to reduce contention.
177201
{{% /notice %}}
178202

179-
180203
The `.fdata` file defined in `--instrumentation-file` will be populated with runtime execution data.
181204

182205
## Verify the profile was created

content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,6 @@ merge-fdata $HOME/mysql-server/build/profile-readonly.fdata $HOME/mysql-server/b
5050
-o $HOME/mysql-server/build/profile-merged.fdata
5151
```
5252

53-
**Example command from an actual setup:**
54-
55-
```bash
56-
/home/ubuntu/llvm-latest/build/bin/merge-fdata prof-instrumentation-readonly.fdata prof-instrumentation-writeonly.fdata \\
57-
-o prof-instrumentation-readwritemerged.fdata
58-
```
59-
6053
Output:
6154

6255
```
@@ -79,15 +72,23 @@ ls -lh $HOME/mysql-server/build/profile-merged.fdata
7972
Use LLVM-BOLT to generate the final optimized binary using the merged `.fdata` file:
8073

8174
```bash
82-
llvm-bolt build/bin/mysqld \\
83-
-o build/bin/mysqldreadwrite_merged.bolt_instrumentation \\
84-
-data=/home/ubuntu/mysql-server-8.0.33/sysbench/prof-instrumentation-readwritemerged.fdata \\
85-
-reorder-blocks=ext-tsp \\
86-
-reorder-functions=hfsort \\
87-
-split-functions \\
88-
-split-all-cold \\
89-
-split-eh \\
90-
-dyno-stats \\
75+
llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \
76+
-instrument \
77+
-o $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \
78+
--instrumentation-file=$HOME/mysql-server/build/profile-readonly.fdata \
79+
--instrumentation-sleep-time=5 \
80+
--instrumentation-no-counters-clear \
81+
--instrumentation-wait-forks
82+
83+
llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \
84+
-o $HOME/mysql-server/build/mysqldreadwrite_merged.bolt_instrumentation \
85+
-data=$HOME/mysql-server/build/prof-instrumentation-readwritemerged.fdata \
86+
-reorder-blocks=ext-tsp \
87+
-reorder-functions=hfsort \
88+
-split-functions \
89+
-split-all-cold \
90+
-split-eh \
91+
-dyno-stats \
9192
--print-profile-stats 2>&1 | tee bolt_orig.log
9293
```
9394

content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,15 @@ layout: learningpathall
1010
If system libraries like `/usr/lib/libssl.so` are stripped, rebuild OpenSSL from source with relocations:
1111

1212
```bash
13+
cd $HOME
1314
git clone https://github.com/openssl/openssl.git
1415
cd openssl
1516
./config -O2 -Wl,--emit-relocs --prefix=$HOME/bolt-libs/openssl
1617
make -j$(nproc)
1718
make install
1819
```
1920

20-
### BOLT-Instrument libssl.so.3
21+
### Instrument libssl
2122

2223
Use `llvm-bolt` to instrument `libssl.so.3`:
2324

@@ -33,11 +34,10 @@ llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \
3334

3435
Then launch MySQL using the **instrumented shared library** and run a **read+write** sysbench test to populate the profile:
3536

36-
### Optimize 'libssl.so' Using Its Profile
37+
### Optimize libssl using the profile
3738

3839
After running the read+write test, ensure `libssl-readwrite.fdata` is populated.
3940

40-
4141
Run BOLT on the uninstrumented `libssl.so` with the collected read-write profile:
4242

4343
```bash
@@ -53,7 +53,7 @@ llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \
5353
--print-profile-stats
5454
```
5555

56-
### Replace the Library at Runtime
56+
### Replace the library at runtime
5757

5858
Copy the optimized version over the original and export the path:
5959

@@ -64,7 +64,7 @@ export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib
6464

6565
This ensures MySQL will dynamically load the optimized `libssl.so`.
6666

67-
### Run Final Workload and Validate Performance
67+
### Run the final workload and validate performance
6868

6969
Start the BOLT-optimized MySQL binary and link it against the optimized `libssl.so`. Run the combined workload:
7070

@@ -86,7 +86,7 @@ taskset -c 7 sysbench \
8686

8787
In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged scenarios.
8888

89-
### BOLT optimization for 'libcrypto.so'
89+
### BOLT optimization for libcrypto
9090

9191
Follow these steps to instrument and optimize `libcrypto.so`:
9292

content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@ This step presents the performance comparisons across various BOLT optimization
1818
| Latency 95th % (ms) | 1.04 | 0.83 | 1.79 |
1919
| Total time (s) | 9.93 | 4.73 | 15.40 |
2020

21-
---
22-
2321
### 2. Performance Comparison: Merged vs Non-Merged Instrumentation
2422

2523
| Metric | Regular BOLT R+W (No Merge, system libssl) | Merged BOLT (BOLTed Read+Write + BOLTed libssl) |
@@ -40,8 +38,6 @@ Second run:
4038
| Latency 95th % (ms) | 1.39 | 1.37 |
4139
| Total time (s) | 239.9 | 239.9 |
4240

43-
---
44-
4541
### 3. BOLTed READ, BOLTed WRITE, MERGED BOLT (Read+Write+BOLTed Libraries)
4642

4743
| Metric | Bolted Read-Only | Bolted Write-Only | Merged BOLT (Read+Write+libssl) | Merged BOLT (Read+Write+libcrypto) | Merged BOLT (Read+Write+libssl+libcrypto) |
@@ -52,21 +48,18 @@ Second run:
5248
| Latency 95th % (ms) | 0.77 | 0.55 | 1.37 | 1.34 | 1.34 |
5349
| Total time (s) | 239.8 | 239.72 | 239.9 | 239.9 | 239.9 |
5450

55-
---
56-
5751
{{% notice Note %}}
5852
All sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for sysbench and avoid contention with mysqld.
5953
{{% /notice %}}
6054

61-
### Key Metrics to Analyze
55+
### Key metrics to analyze
6256

6357
- **TPS (Transactions Per Second)**: Higher is better.
6458
- **QPS (Queries Per Second)**: Higher is better.
6559
- **Latency (Average and 95th Percentile)**: Lower is better.
6660

67-
---
68-
6961
### Conclusion
62+
7063
- BOLT substantially improves performance over non-optimized binaries due to better instruction cache utilization and reduced execution path latency.
7164
- Merging feature-specific profiles does not negatively affect performance; instead, it captures a broader set of runtime behaviors, making the binary better tuned for varied real-world workloads.
7265
- Separately optimizing external user-space libraries, even though providing smaller incremental gains, further complements the overall application optimization, delivering a fully optimized execution environment.

0 commit comments

Comments
 (0)