diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png b/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png new file mode 100644 index 0000000000..8bbee946af Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/bolt-merge/bolt-merge.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md index cb346ca09a..19d4027fbf 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-1.md @@ -10,11 +10,13 @@ layout: learningpathall Make sure you have [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. -You should use an Arm Linux system with at least 4 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible. +You should use an Arm Linux system with at least 8 CPUs and 16 Gb of RAM. Ubuntu 24.04 is used for testing, but other Linux distributions are possible. ## What will I do in this Learning Path? -In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the applcation and two share libraries which are used by MySQL are also optimized using BOLT. +In this Learning Path you learn how to use BOLT to optimize applications and shared libraries. MySQL is used as the application and two share libraries which are used by MySQL are also optimized using BOLT. + +Here is an outline of the steps: 1. Collect and merge BOLT profiles from multiple workloads, such as read-only and write-only @@ -36,10 +38,15 @@ In this Learning Path you learn how to use BOLT to optimize applications and sha After optimizing each component, you combine them to create a deployment where both the application and its libraries benefit from BOLT's enhancements. +## What is BOLT profile merging? + +BOLT profile merging is the process of combining profiling from multiple runs into a single profile. This merged profile enables BOLT to optimize binaries for a broader set of real-world behaviors, ensuring that the final optimized application or library performs well across diverse workloads, not just a single use case. By merging profiles, you capture a wider range of code paths and execution patterns, leading to more robust and effective optimizations. + +![Why BOLT Profile Merging?](Bolt-merge.png) ## What are good applications for BOLT? -MySQL and sysbench are used as example applications, but you can use this method for **any feature-rich application** that: +MySQL and Sysbench are used as example applications, but you can use this method for any feature-rich application that: 1. Exhibits multiple runtime paths @@ -47,7 +54,7 @@ MySQL and sysbench are used as example applications, but you can use this method 2. Uses dynamic libraries - Many modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application. + Most modern applications rely on shared libraries for functionality. Optimizing these libraries alongside the main binary ensures consistent performance improvements throughout the application. 3. Requires full-stack binary optimization for performance-critical deployment diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md index 754e6e89e1..2b5f34c08f 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-2.md @@ -1,5 +1,5 @@ --- -title: BOLT Optimization - First feature +title: Instrument MySQL with BOLT weight: 3 ### FIXED, DO NOT MODIFY @@ -10,33 +10,92 @@ In this step, you will use BOLT to instrument the MySQL application binary and t The collected profiles will be merged with others and used to optimize the application's code layout. -### Build the uninstrumented binary +## Build mysqld from source -Make sure your application binary is: +Follow these steps to build the MySQL server (`mysqld`) from source: -- Built from source (e.g., `mysqld`) +Install the required dependencies: + +```bash +sudo apt update +sudo apt install -y build-essential cmake libncurses5-dev libssl-dev libboost-all-dev bison pkg-config libaio-dev libtirpc-dev git +``` + +Download the MySQL source code. You can change to another version in the `checkout` command below if needed. + +```bash +git clone https://github.com/mysql/mysql-server.git +cd mysql-server +git checkout mysql-8.4.5 +``` + +Configure the build for debug: + +```bash +mkdir build && cd build +cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_DEBUG=1 -DCMAKE_C_FLAGS="-fno-omit-frame-pointer" \ + -DCMAKE_CXX_FLAGS="-fno-omit-frame-pointer" -DCMAKE_POSITION_INDEPENDENT_CODE=OFF \ + -DCMAKE_EXE_LINKER_FLAGS="-Wl,--emit-relocs" \ + -DCMAKE_EXE_LINKER_FLAGS="-no-pie" +``` + +Build mysqld: + +```bash +make -j$(nproc) +``` + +After the build completes, the `mysqld` binary is located at `$HOME/mysql-server/build/runtime_output_directory/mysqld` + +{{% notice Note %}} +You can run `mysqld` directly from the build directory as shown, or run `make install` to install it system-wide. For testing and instrumentation, running from the build directory is usually preferred. +{{% /notice %}} + +After building mysqld, install MySQL server and client utilities system-wide: + +```bash +sudo make install +``` + +This will make the `mysql` client and other utilities available in your PATH. + +Ensure the binary is unstripped and includes debug symbols for BOLT instrumentation. + +To work with BOLT, your application binary should be: + +- Built from source - Unstripped, with symbol information available - Compiled with frame pointers enabled (`-fno-omit-frame-pointer`) You can verify this with: ```bash -readelf -s /path/to/mysqld | grep main +readelf -s $HOME/mysql-server/build/runtime_output_directory/mysqld | grep main +``` + +The partial output is: + +```output + 23837: 000000000950dfe8 8 OBJECT GLOBAL DEFAULT 27 mysql_main + 34522: 000000000915bfd0 8 OBJECT GLOBAL DEFAULT 26 server_main_callback + 42773: 00000000051730e4 80 FUNC GLOBAL DEFAULT 13 _Z18my_main_thre[...] + 44882: 000000000357dc98 40 FUNC GLOBAL DEFAULT 13 main + 61046: 0000000005ffd5c0 40 FUNC GLOBAL DEFAULT 13 _Z21record_main_[...] ``` If the symbols are missing, rebuild the binary with debug info and no stripping. -### Step 2: Instrument the binary with BOLT +## Instrument the binary with BOLT Use `llvm-bolt` to create an instrumented version of the binary: ```bash -llvm-bolt /path/to/mysqld \\ - -instrument \\ - -o /path/to/mysqld.instrumented \\ - --instrumentation-file=/path/to/profile-readonly.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/mysql-server/build/runtime_output_directory/mysqld \ + -instrument \ + -o $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \ + --instrumentation-file=$HOME/mysql-server/build/profile-readonly.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` @@ -46,38 +105,86 @@ llvm-bolt /path/to/mysqld \\ - `--instrumentation-file`: Path where the profile output will be saved - `--instrumentation-wait-forks`: Ensures the instrumentation continues through forks (important for daemon processes) ---- -### Step 3: Run the instrumented binary under a feature-specific workload +## Start the instrumented MySQL server + +Before running the workload, start the instrumented MySQL server in a separate terminal. You may need to initialize a new data directory if this is your first run: + +```bash +# Initialize a new data directory (if needed) +$HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented --initialize-insecure --datadir=$HOME/mysql-bolt-data + +# Start the instrumented server +# On an 8-core system, use available cores (e.g., 6 for mysqld, 7 for sysbench) +taskset -c 6 $HOME/mysql-server/build/runtime_output_directory/mysqld.instrumented \ + --datadir=$HOME/mysql-bolt-data \ + --socket=$HOME/mysql-bolt.sock \ + --port=3306 \ + --user=$(whoami) & +``` + +Adjust `--datadir`, `--socket`, and `--port` as needed for your environment. Make sure the server is running and accessible before proceeding. + +## Install sysbench + +You will need sysbench to generate workloads for MySQL. On most Arm Linux distributions, you can install it using your package manager: + +```bash +sudo apt update +sudo apt install -y sysbench +``` + +Alternatively, see the [sysbench GitHub page](https://github.com/akopytov/sysbench) for build-from-source instructions if a package is not available for your platform. + +## Create a test database and user + +For sysbench to work, you need a test database and user. Connect to the MySQL server as the root user (or another admin user) and run: + +```bash +mysql -u root --socket=$HOME/mysql-bolt.sock +``` + +Then, in the MySQL shell: + +```sql +CREATE DATABASE IF NOT EXISTS bench; +CREATE USER IF NOT EXISTS 'bench'@'localhost' IDENTIFIED BY 'bench'; +GRANT ALL PRIVILEGES ON bench.* TO 'bench'@'localhost'; +FLUSH PRIVILEGES; +EXIT; +``` + +## Run the instrumented binary under a feature-specific workload Use a workload generator to stress the binary in a feature-specific way. For example, to simulate **read-only traffic** with sysbench: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_only.lua run +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_only.lua run ``` -> Adjust this command as needed for your workload and CPU/core binding. +{{% notice Note %}} +On an 8-core system, cores are numbered 0-7. Adjust the `taskset -c` values as needed for your system. Avoid using the same core for both mysqld and sysbench to reduce contention. +{{% /notice %}} -The `.fdata` file defined in `--instrumentation-file` will be populated with runtime execution data. ---- +The `.fdata` file defined in `--instrumentation-file` will be populated with runtime execution data. -### Step 4: Verify the profile was created +## Verify the profile was created After running the workload: ```bash -ls -lh /path/to/profile-readonly.fdata +ls -lh $HOME/mysql-server/build/profile-readonly.fdata ``` You should see a non-empty file. This file will later be merged with other profiles (e.g., for write-only traffic) to generate a complete merged profile. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md index f1ea41f09c..2b701ba094 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-3.md @@ -1,38 +1,39 @@ --- -title: BOLT Optimization - Second Feature & BOLT Merge to combine +title: Run a new workload using BOLT and merge the results weight: 4 ### FIXED, DO NOT MODIFY layout: learningpathall --- -In this step, you'll collect profile data for a **write-heavy** workload and also **instrument external libraries** such as `libcrypto.so` and `libssl.so` used by the application (e.g., MySQL). +Next, you will collect profile data for a **write-heavy** workload and merge the results with the **read-heavy** workload in the previous section. - -### Step 1: Run Write-Only Workload for Application Binary +## Run Write-Only Workload for Application Binary Use the same BOLT-instrumented MySQL binary and drive it with a write-only workload to capture `profile-writeonly.fdata`: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_write_only.lua run +# On an 8-core system, use available cores (e.g., 7 for sysbench) +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_write_only.lua run ``` Make sure that the `--instrumentation-file` is set appropriately to save `profile-writeonly.fdata`. ---- -### Step 2: Verify the Second Profile Was Generated + + +### Verify the Second Profile Was Generated ```bash -ls -lh /path/to/profile-writeonly.fdata +ls -lh $HOME/mysql-server/build/profile-writeonly.fdata ``` Both `.fdata` files should now exist and contain valid data: @@ -40,15 +41,13 @@ Both `.fdata` files should now exist and contain valid data: - `profile-readonly.fdata` - `profile-writeonly.fdata` ---- - -### Step 3: Merge the Feature Profiles +### Merge the Feature Profiles Use `merge-fdata` to combine the feature-specific profiles into one comprehensive `.fdata` file: ```bash -merge-fdata /path/to/profile-readonly.fdata /path/to/profile-writeonly.fdata \\ - -o /path/to/profile-merged.fdata +merge-fdata $HOME/mysql-server/build/profile-readonly.fdata $HOME/mysql-server/build/profile-writeonly.fdata \ + -o $HOME/mysql-server/build/profile-merged.fdata ``` **Example command from an actual setup:** @@ -67,18 +66,15 @@ Profile from 2 files merged. This creates a single merged profile (`profile-merged.fdata`) covering both read-only and write-only workload behaviors. ---- - -### Step 4: Verify the Merged Profile +### Verify the Merged Profile Check the merged `.fdata` file: ```bash -ls -lh /path/to/profile-merged.fdata +ls -lh $HOME/mysql-server/build/profile-merged.fdata ``` ---- -### Step 5: Generate the Final Binary with the Merged Profile +### Generate the Final Binary with the Merged Profile Use LLVM-BOLT to generate the final optimized binary using the merged `.fdata` file: diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md index 376c249164..283eecb617 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-4.md @@ -1,11 +1,11 @@ --- -title: BOLT the Libraries separately +title: Instrument shared libraries with BOLT weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -### Step 1: Instrument Shared Libraries (e.g., libcrypto, libssl) +### Instrument Shared Libraries (e.g., libcrypto, libssl) If system libraries like `/usr/lib/libssl.so` are stripped, rebuild OpenSSL from source with relocations: @@ -17,27 +17,23 @@ make -j$(nproc) make install ``` ---- - -### Step 2: BOLT-Instrument libssl.so.3 +### BOLT-Instrument libssl.so.3 Use `llvm-bolt` to instrument `libssl.so.3`: ```bash -llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \\ - -instrument \\ - -o $HOME/bolt-libs/openssl/lib/libssl.so.3.instrumented \\ - --instrumentation-file=libssl-readwrite.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ + -instrument \ + -o $HOME/bolt-libs/openssl/lib/libssl.so.3.instrumented \ + --instrumentation-file=$HOME/bolt-libs/openssl/lib/libssl-readwrite.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` Then launch MySQL using the **instrumented shared library** and run a **read+write** sysbench test to populate the profile: ---- - -### Step 3: Optimize 'libssl.so' Using Its Profile +### Optimize 'libssl.so' Using Its Profile After running the read+write test, ensure `libssl-readwrite.fdata` is populated. @@ -45,69 +41,64 @@ After running the read+write test, ensure `libssl-readwrite.fdata` is populated. Run BOLT on the uninstrumented `libssl.so` with the collected read-write profile: ```bash -llvm-bolt /path/to/libssl.so.3 \\ - -o /path/to/libssl.so.optimized \\ - -data=/path/to/prof-instrumentation-libssl-readwrite.fdata \\ - -reorder-blocks=ext-tsp \\ - -reorder-functions=hfsort \\ - -split-functions \\ - -split-all-cold \\ - -split-eh \\ - -dyno-stats \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libssl.so.3 \ + -o $HOME/bolt-libs/openssl/lib/libssl.so.optimized \ + -data=$HOME/bolt-libs/openssl/lib/libssl-readwrite.fdata \ + -reorder-blocks=ext-tsp \ + -reorder-functions=hfsort \ + -split-functions \ + -split-all-cold \ + -split-eh \ + -dyno-stats \ --print-profile-stats ``` ---- - -### Step 3: Replace the Library at Runtime +### Replace the Library at Runtime Copy the optimized version over the original and export the path: ```bash -cp /path/to/libssl.so.optimized /path/to/libssl.so.3 -export LD_LIBRARY_PATH=/path/to/ +cp $HOME/bolt-libs/openssl/lib/libssl.so.optimized $HOME/bolt-libs/openssl/lib/libssl.so.3 +export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib ``` This ensures MySQL will dynamically load the optimized `libssl.so`. ---- - -### Step 4: Run Final Workload and Validate Performance +### Run Final Workload and Validate Performance Start the BOLT-optimized MySQL binary and link it against the optimized `libssl.so`. Run the combined workload: ```bash -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_write.lua run +# On an 8-core system, use available cores (e.g., 7 for sysbench) +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_write.lua run ``` ---- -In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged - scenarios. +In the next step, you'll optimize an additional critical external library (`libcrypto.so`) using BOLT, following a similar process as `libssl.so`. Afterward, you'll interpret performance results to validate and compare optimizations across baseline and merged scenarios. -### Step 1: BOLT optimization for 'libcrypto.so' +### BOLT optimization for 'libcrypto.so' Follow these steps to instrument and optimize `libcrypto.so`: #### Instrument `libcrypto.so`: ```bash -llvm-bolt /path/to/libcrypto.so.3 \\ - -instrument \\ - -o /path/to/libcrypto.so.3.instrumented \\ - --instrumentation-file=libcrypto-readwrite.fdata \\ - --instrumentation-sleep-time=5 \\ - --instrumentation-no-counters-clear \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libcrypto.so.3 \ + -instrument \ + -o $HOME/bolt-libs/openssl/lib/libcrypto.so.3.instrumented \ + --instrumentation-file=$HOME/bolt-libs/openssl/lib/libcrypto-readwrite.fdata \ + --instrumentation-sleep-time=5 \ + --instrumentation-no-counters-clear \ --instrumentation-wait-forks ``` @@ -115,39 +106,39 @@ Run MySQL under the read-write workload to populate `libcrypto-readwrite.fdata`: ```bash export LD_LIBRARY_PATH=/path/to/libcrypto-instrumented -taskset -c 9 ./src/sysbench \\ - --db-driver=mysql \\ - --mysql-host=127.0.0.1 \\ - --mysql-db=bench \\ - --mysql-user=bench \\ - --mysql-password=bench \\ - --mysql-port=3306 \\ - --tables=8 \\ - --table-size=10000 \\ - --threads=1 \\ - src/lua/oltp_read_write.lua run +taskset -c 7 sysbench \ + --db-driver=mysql \ + --mysql-host=127.0.0.1 \ + --mysql-db=bench \ + --mysql-user=bench \ + --mysql-password=bench \ + --mysql-port=3306 \ + --tables=8 \ + --table-size=10000 \ + --threads=1 \ + /usr/share/sysbench/oltp_read_write.lua run ``` -#### Optimize the `libcrypto.so` library: +#### Optimize the crypto library ```bash -llvm-bolt /path/to/original/libcrypto.so.3 \\ - -o /path/to/libcrypto.so.optimized \\ - -data=libcrypto-readwrite.fdata \\ - -reorder-blocks=ext-tsp \\ - -reorder-functions=hfsort \\ - -split-functions \\ - -split-all-cold \\ - -split-eh \\ - -dyno-stats \\ +llvm-bolt $HOME/bolt-libs/openssl/lib/libcrypto.so.3 \ + -o $HOME/bolt-libs/openssl/lib/libcrypto.so.optimized \ + -data=libcrypto-readwrite.fdata \ + -reorder-blocks=ext-tsp \ + -reorder-functions=hfsort \ + -split-functions \ + -split-all-cold \ + -split-eh \ + -dyno-stats \ --print-profile-stats ``` Replace the original at runtime: ```bash -cp /path/to/libcrypto.so.optimized /path/to/libcrypto.so.3 -export LD_LIBRARY_PATH=/path/to/ +cp $HOME/bolt-libs/openssl/lib/libcrypto.so.optimized $HOME/bolt-libs/openssl/lib/libcrypto.so.3 +export LD_LIBRARY_PATH=$HOME/bolt-libs/openssl/lib ``` Run a final validation workload to ensure functionality and measure performance improvements. diff --git a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md index 07cd298c5f..9bb95550d5 100644 --- a/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md +++ b/content/learning-paths/servers-and-cloud-computing/bolt-merge/how-to-5.md @@ -1,5 +1,5 @@ --- -title: Performance Results - Baseline, BOLT Merge, and Full Optimization +title: Review the performance results weight: 6 ### FIXED, DO NOT MODIFY @@ -54,6 +54,10 @@ Second run: --- +{{% notice Note %}} +All sysbench and .fdata file paths, as well as taskset usage, should match the conventions in previous steps: use sysbench from PATH (no src/), use /usr/share/sysbench/ for Lua scripts, and use $HOME-based paths for all .fdata and library files. On an 8-core system, use taskset -c 7 for sysbench and avoid contention with mysqld. +{{% /notice %}} + ### Key Metrics to Analyze - **TPS (Transactions Per Second)**: Higher is better.