Skip to content

Commit 0fc68fb

Browse files
Merge pull request #1946 from madeline-underwood/profile_guided_opt
Profile guided opt_JA to review
2 parents 89c52e8 + 2f7dbfc commit 0fc68fb

File tree

6 files changed

+41
-37
lines changed

6 files changed

+41
-37
lines changed

content/learning-paths/servers-and-cloud-computing/cpp-profile-guided-optimisation/_index.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,13 @@
11
---
2-
title: Optimizing Performance with Profile-Guided Optimization and Google Benchmark
3-
4-
draft: true
5-
cascade:
6-
draft: true
2+
title: Optimize C++ performance with Profile-Guided Optimization and Google Benchmark
73

84
minutes_to_complete: 15
95

10-
who_is_this_for: Developers who are looking to optimize C++ performance using characteristics observed at runtime.
6+
who_is_this_for: Developers looking to optimize C++ performance based on runtime behavior.
117

128
learning_objectives:
13-
- Learn how to microbenchmark a function using Google Benchmark.
14-
- Learn how to use profile guided optimization to build binaries optimized for real-world workloads.
9+
- Microbenchmark a function using Google Benchmark.
10+
- Apply profile-guided optimization to build performance-tuned binaries.
1511

1612
prerequisites:
1713
- Basic C++ understanding.
@@ -32,7 +28,7 @@ operatingsystems:
3228

3329
further_reading:
3430
- resource:
35-
title: G++ Profile Guided Optimization Documentation
31+
title: G++ profile-guided optimization documentation
3632
link: https://gcc.gnu.org/onlinedocs/gcc-13.3.0/gcc/Instrumentation-Options.html
3733
type: documentation
3834
- resource:
Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Introduction to Profile-Guided Optimization
2+
title: Profile-Guided Optimization
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -8,16 +8,19 @@ layout: learningpathall
88

99
### What is Profile-Guided Optimization (PGO) and how does it work?
1010

11-
Profile-Guided Optimization (PGO) is a compiler optimization technique that enhances program performance by utilizing real-world execution data. In GCC/G++, PGO involves a two-step process: first, compile the program with the `-fprofile-generate` flag to produce an instrumented binary that collects profiling data during execution; and second, recompile the program with the `-fprofile-use` flag, allowing the compiler to leverage the collected data to make informed optimization decisions. This approach identifies frequently executed paths—known as “hot” paths—and optimizes them more aggressively, while potentially reducing emphasis on less critical code paths.
11+
Profile-Guided Optimization (PGO) is a compiler optimization technique that enhances program performance by utilizing real-world execution data. In GCC/G++, PGO involves a two-step process:
12+
13+
- First, compile the program with the `-fprofile-generate` flag to produce an instrumented binary that collects profiling data during execution;
14+
- Second, recompile the program with the `-fprofile-use` flag, allowing the compiler to leverage the collected data to make informed optimization decisions. This approach identifies frequently executed paths — known as “hot” paths — and optimizes them more aggressively, while potentially reducing emphasis on less critical code paths.
1215

1316
### When should I use Profile-Guided Optimization?
1417

15-
PGO is particularly beneficial in the later stages of development when real-world workloads are available. It is most effective for applications where performance is critical and runtime behavior is complex or data-dependent. For instance, consider optimizing “hot” functions that execute frequently. Doing so ensures that the most impactful parts of your code are optimized based on actual usage patterns.
18+
PGO is particularly beneficial in the later stages of development when real-world workloads are available. It is especially useful for applications where performance is critical and runtime behavior is complex or data-dependent. For instance, consider optimizing “hot” functions that execute frequently. Doing so ensures that the most impactful parts of your code are optimized based on actual usage patterns.
1619

1720
### What are the limitations of Profile-Guided Optimization and when should I avoid it?
1821

1922
While PGO offers substantial performance benefits, it has limitations. The profiling data must accurately represent typical usage scenarios; otherwise, the optimizations may not deliver the desired performance improvements and could even degrade performance.
2023

2124
Additionally, the process requires extra build steps, potentially increasing compile times for large codebases. Therefore, use PGO only on performance-critical sections that are heavily influenced by actual runtime behavior. PGO might not be ideal for early-stage development or applications with highly variable or unpredictable usage patterns.
2225

23-
Please refer to the [GCC documentation](https://gcc.gnu.org/onlinedocs/gcc-13.3.0/gcc/Instrumentation-Options.html) for further details on enabling and using PGO.
26+
For further information, see the [GCC documentation](https://gcc.gnu.org/onlinedocs/gcc-13.3.0/gcc/Instrumentation-Options.html) for further details on enabling and using PGO.

content/learning-paths/servers-and-cloud-computing/cpp-profile-guided-optimisation/how-to-2.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Introduction to Google Benchmark
2+
title: Google Benchmark
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
@@ -8,9 +8,11 @@ layout: learningpathall
88

99
## Google Benchmark
1010

11-
Google Benchmark is a C++ library specifically designed for microbenchmarking – measuring the performance of small code snippets with high accuracy. Microbenchmarking is essential for identifying bottlenecks and optimizing critical sections of code, especially in performance-sensitive applications. Google Benchmark simplifies this process by providing a framework that handles common tasks like managing iterations, timing execution, and performing statistical analysis. This allows you to focus on the code being measured rather than writing boilerplate code for testing scenarios or trying to prevent unwanted compiler optimizations.
11+
Google Benchmark is a C++ library specifically designed for microbenchmarking – measuring the performance of small code snippets with high accuracy. Microbenchmarking is essential for identifying bottlenecks and optimizing critical sections, especially in performance-sensitive applications.
1212

13-
To use Google Benchmark, you define a function that contains the code you want to measure. This function should accept a `benchmark::State&` parameter and iterate over it to perform the benchmarking. You then register this function using the `BENCHMARK` macro and include `BENCHMARK_MAIN()` to create the main function for the benchmark executable.
13+
Google Benchmark simplifies this process by providing a framework that manages iterations, times execution, and performs statistical analysis. This allows you to focus on the code being measured, rather than writing boilerplate or trying to prevent unwanted compiler optimizations manually.
14+
15+
To use Google Benchmark, define a function that accepts a `benchmark::State&` parameter and iterate over it to perform the benchmarking. Register the function using the `BENCHMARK` macro and include `BENCHMARK_MAIN()` to generate the benchmark's entry point.
1416

1517
Here's a basic example:
1618

@@ -28,7 +30,7 @@ BENCHMARK_MAIN();
2830
2931
### Filtering and Preventing Compiler Optimizations
3032
31-
Google Benchmark provides tools to ensure accurate measurements by preventing the compiler from optimizing away parts of your benchmarked code:
33+
Google Benchmark provides tools to ensure accurate measurements by preventing unintended compiler optimizations and allowing flexible benchmark selection.
3234
3335
1. **Preventing Optimizations**: Use `benchmark::DoNotOptimize(value);` to force the compiler to read and store a variable or expression, ensuring it is not optimized away.
3436
@@ -37,6 +39,7 @@ Google Benchmark provides tools to ensure accurate measurements by preventing th
3739
```bash
3840
./benchmark_binary --benchmark_filter=BM_String.*
3941
```
40-
This eliminates the need to repeatedly comment out lines of source code.
42+
43+
This eliminates the need to repeatedly comment out lines of source code.
4144

4245
For more detailed information and advanced usage, refer to the [official documentation](https://github.com/google/benchmark).

content/learning-paths/servers-and-cloud-computing/cpp-profile-guided-optimisation/how-to-3.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
---
2-
title: Division Example
2+
title: Example operation
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Introduction
9+
## Optimizing costly division operations with Google Benchmark and PGO
1010

1111
In this section, you'll learn how to use Google Benchmark and Profile-Guided Optimization to improve the performance of a simple division operation. This example demonstrates how even seemingly straightforward operations can benefit from optimization techniques.
1212

13-
Integer division is an excellent operation to benchmark because it's typically much more expensive than other arithmetic operations like addition, subtraction, or multiplication. On most CPU architectures, including Arm, division instructions have higher latency and lower throughput compared to other arithmetic operations. By applying Profile-Guided Optimization to code containing division operations, we can potentially achieve significant performance improvements.
13+
Integer division is ideal for benchmarking because it's significantly more expensive than operations like addition, subtraction, or multiplication. On most CPU architectures, including Arm, division instructions have higher latency and lower throughput compared to other arithmetic operations. By applying Profile-Guided Optimization to code containing division operations, we can potentially achieve significant performance improvements.
1414

1515
## What tools are needed to run a Google Benchmark example?
1616

@@ -63,7 +63,7 @@ Run the program:
6363
./div_bench.base
6464
```
6565

66-
The output is:
66+
### Example output
6767

6868
```output
6969
Running ./div_bench.base
@@ -81,9 +81,9 @@ Benchmark Time CPU Iterations
8181
baseDiv/1500 7.90 us 7.90 us 88512
8282
```
8383

84-
### Inspect Assembly
84+
### Inspect assembly
8585

86-
To inspect what assembly instructions are being executed the most frequently, you can use the `perf` command. This is useful for identifying bottlenecks and understanding the performance characteristics of your code.
86+
To inspect what assembly instructions are being executed most frequently, you can use the `perf` command. This is useful for identifying bottlenecks and understanding the performance characteristics of your code.
8787

8888
Install Perf using the [install guide](https://learn.arm.com/install-guides/perf/) before proceeding.
8989

content/learning-paths/servers-and-cloud-computing/cpp-profile-guided-optimisation/how-to-4.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ weight: 5
66
layout: learningpathall
77
---
88

9-
### Building binary with PGO
9+
### Build with PGO
1010

11-
To generate a binary optimized using the runtime profile, first build an instrumented binary that records usage data. Run the following command, which includes the `-fprofile-generate` flag, to build the instrumented binary:
11+
To generate a binary optimized using runtime profile data, first build an instrumented binary that records usage data. Run the following command, which includes the `-fprofile-generate` flag, to build the instrumented binary:
1212

1313
```bash
1414
g++ -O3 -std=c++17 -fprofile-generate div_bench.cpp -lbenchmark -lpthread -o div_bench.opt
@@ -20,21 +20,23 @@ Next, run the instrumented binary to generate the profile data:
2020
./div_bench.opt
2121
```
2222

23-
This execution creates profile data files (typically with a `.gcda` extension) in the same directory. To incorporate this profile data into the compilation, rebuild the program using the `-fprofile-use` flag:
23+
This execution creates profile data files (typically with a `.gcda` extension) in the same directory.
24+
25+
Now recompile the program using the `-fprofile-use` flag to apply optimizations based on the collected data:
2426

2527
```bash
2628
g++ -O3 -std=c++17 -fprofile-use div_bench.cpp -lbenchmark -lpthread -o div_bench.opt
2729
```
2830

29-
### Running the optimized binary
31+
### Run the optimized binary
3032

31-
Run again with the optimized binary:
33+
Now run the optimized binary:
3234

3335
```bash
3436
./div_bench.opt
3537
```
3638

37-
Running the newly created `div_bench.opt` binary, you observe the following improvement:
39+
The following output shows the performance improvement:
3840

3941
```output
4042
Running ./div_bench.opt
@@ -52,13 +54,13 @@ Benchmark Time CPU Iterations
5254
baseDiv/1500 2.86 us 2.86 us 244429
5355
```
5456

55-
As the terminal output above shows, the average execution time is reduced from 7.90 to 2.86 microseconds. **This improvement occurs because the profile data informed the compiler that the input divisor was consistently 1500 during the profiled runs, allowing it to apply specific optimizations.**
57+
As the terminal output above shows, the average execution time is reduced from 7.90 to 2.86 microseconds. This improvement occurs because the profile data informed the compiler that the input divisor was consistently 1500 during the profiled runs, allowing it to apply specific optimizations.
5658

5759
Next, let's examine how the code was optimized at the assembly level.
5860

59-
### Inspect Assembly
61+
### Inspect assembly
6062

61-
Run the following commands to record `perf` data for the optimized binary and create a report:
63+
Use `perf` to inspect how the compiler optimized the binary:
6264

6365
```bash
6466
sudo perf record -o perf-division-opt ./div_bench.opt

content/learning-paths/servers-and-cloud-computing/cpp-profile-guided-optimisation/how-to-5.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ weight: 6
66
layout: learningpathall
77
---
88

9-
### Building locally with make
9+
### Build locally with make
1010

1111
PGO can be integrated into a `Makefile` and continuous integration (CI) systems using simple command-line instructions, as shown in the sample `Makefile` below.
1212

1313
{{% notice Caution %}}
14-
PGO requires additional build steps which will inevitably increase compile time which can be an issue for large code bases. As such, PGO is not suitable for all sections of code. You should PGO only for sections of code which are heavily influenced by run-time behavior and are performance critical. Therefore, PGO might not be ideal for early-stage development or for applications with highly variable or unpredictable usage patterns.
14+
PGO adds additional build steps which can increase compile time - especially for large code bases. As such, PGO is not suitable for all sections of code. You should PGO only for sections of code which are heavily influenced by run-time behavior and are performance critical. Therefore, PGO might not be ideal for early-stage development or for applications with highly variable or unpredictable usage patterns.
1515
{{% /notice %}}
1616

17-
Use a text editor to create a `Makefile` for the example.
17+
Use a text editor to create a file named `Makefile` containing the following content:
1818

1919
```makefile
2020
# Simple Makefile for building and benchmarking div_bench with and without PGO
@@ -69,7 +69,7 @@ You can run the following commands in your terminal:
6969
* `make run`: Builds both binaries (if they don't exist) and then runs them, displaying the benchmark results for comparison.
7070
* `make clean`: Removes the compiled binaries (`div_bench.base`, `div_bench.opt`) and any generated profile data files (`*.gcda`).
7171

72-
### Building with GitHub Actions
72+
### Build with GitHub Actions
7373

7474
Alternatively, you can integrate PGO into your Continuous Integration (CI) workflow using GitHub Actions. The YAML file below provides a basic example that compiles and runs the benchmark on a GitHub-hosted Ubuntu 24.04 Arm-based runner. This setup can be extended with automated tests to check for performance regressions.
7575

0 commit comments

Comments
 (0)