Skip to content

Commit 2972c16

Browse files
Update BOLT SPE documentation.
1 parent dec3dfd commit 2972c16

File tree

2 files changed

+55
-30
lines changed

2 files changed

+55
-30
lines changed

content/learning-paths/servers-and-cloud-computing/bolt/_index.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,16 @@ minutes_to_complete: 30
55

66
who_is_this_for: This is an introductory topic for software developers who want to learn how to use BOLT on an Arm executable.
77

8-
learning_objectives:
9-
- Build an application which is ready to be optimized by BOLT
8+
learning_objectives:
9+
- Build an application which is ready to be optimized by BOLT
1010
- Profile an application and collect performance information
11-
- Run BOLT to create an optimized executable
11+
- Run BOLT to create an optimized executable
1212

1313
prerequisites:
14-
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available.
14+
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available. For [SPE](./bolt-spe) the version should be 6.14 or later.
1515
- (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
1616

17-
author_primary: Jonathan Davies
17+
author: Jonathan Davies
1818

1919
### Tags
2020
skilllevels: Introductory
@@ -25,9 +25,23 @@ armips:
2525
tools_software_languages:
2626
- BOLT
2727
- perf
28+
- Runbook
29+
2830
operatingsystems:
2931
- Linux
3032

33+
further_reading:
34+
- resource:
35+
title: BOLT README
36+
link: https://github.com/llvm/llvm-project/tree/main/bolt
37+
type: documentation
38+
- resource:
39+
title: BOLT - A Practical Binary Optimizer for Data Centers and Beyond
40+
link: https://research.facebook.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/
41+
type: website
42+
43+
44+
3145
### FIXED, DO NOT MODIFY
3246
# ================================================================================
3347
weight: 1 # _index.md always has weight of 1 to order correctly

content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md

Lines changed: 36 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,66 @@
11
---
2-
title: Using BOLT with SPE
2+
title: Use BOLT with SPE
33
weight: 6
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

99
## BOLT with SPE
10+
The steps to use BOLT with Perf SPE are listed below.
1011

11-
The steps to optimize an executable with BOLT using Perf SPE is below.
12+
### Collect Perf data with SPE
1213

13-
### Collect Perf data with SPE
14+
First, make sure you are using Linux Perf version v6.14 or later, which supports the 'brstack' field and captures all branch information.
1415

15-
Run your executable in the normal use case and collect a SPE performance profile. This will output a `perf.data` file containing the profile and will be used to optimize the executable.
16+
```bash { output_lines = "2" }
17+
perf --version
18+
perf version 6.14
19+
```
20+
21+
Next, run your executable in the normal use case to collect an SPE performance profile. This generates a `perf.data` file containing the profile, which will be used to optimize the executable.
1622

17-
Record samples while running your application. Substitute the actual name of your application for `executable`:
23+
Record samples while running your application, replacing `executable` below:
1824

1925
```bash { target="ubuntu:latest" }
20-
perf record -e arm_spe/branch_filter=1/u -o perf.data-- ./executable
26+
perf record -e 'arm_spe/branch_filter=1/u' -o perf.data -- ./executable
2127
```
28+
Once the execution is complete, perf will print a summary that includes the size of the `perf.data` file:
2229

23-
Perf prints the size of the `perf.data` file:
24-
25-
```output
30+
```bash { target="ubuntu:latest" }
2631
[ perf record: Woken up 79 times to write data ]
2732
[ perf record: Captured and wrote 4.910 MB perf.data ]
2833
```
2934

30-
### Convert the Profile into BOLT format
35+
The `-jitter=1` flag can help avoid resonance, while `-c`/`--count` controls the sampling period.
3136

32-
`perf2bolt` converts the profile into a BOLT data format. For the given sample data, `perf2bolt` finds all instruction pointers in the profile, maps them back to the assembly instructions, and outputs a count of how many times each assembly instruction was sampled.
37+
### Convert the Profile to BOLT format
3338

34-
If you application is named `executable`, run the commend below to convert the profile data:
39+
`perf2bolt` converts the profile into BOLT's data format. It maps branch events from the profile to assembly instructions and outputs branch execution traces with sample counts.
40+
41+
If you application is named `executable`, run the command below to convert the profile data:
3542

3643
```bash { target="ubuntu:latest" }
37-
perf2bolt -p perf.data -o perf.fdata -nl ./executable
44+
perf2bolt -p perf.data -o perf.fdata --spe ./executable
3845
```
3946

40-
Below is example output from `perf2bolt`, it has read all samples and created the file `perf.fdata`.
47+
Below is example output from `perf2bolt`, it has read all samples from `perf.data` and created the converted profile `perf.fdata`.
4148

4249
```output
4350
BOLT-INFO: shared object or position-independent executable detected
4451
PERF2BOLT: Starting data aggregation job for perf.data
45-
PERF2BOLT: spawning perf job to read events without LBR
52+
PERF2BOLT: spawning perf job to read SPE brstack events
4653
PERF2BOLT: spawning perf job to read mem events
4754
PERF2BOLT: spawning perf job to read process events
4855
PERF2BOLT: spawning perf job to read task events
4956
BOLT-INFO: Target architecture: aarch64
50-
BOLT-INFO: BOLT version: c66c15a76dc7b021c29479a54aa1785928e9d1bf
57+
BOLT-INFO: BOLT version: b1516a9d688fed835dce5efc614302649c3baf0e
5158
BOLT-INFO: first alloc address is 0x0
52-
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
59+
BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4600000
5360
BOLT-INFO: enabling relocation mode
54-
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
5561
BOLT-INFO: enabling strict relocation mode for aggregation purposes
5662
BOLT-INFO: pre-processing profile using perf data aggregator
57-
BOLT-INFO: binary build-id is: 21dbca691155f1e57825e6381d727842f3d43039
63+
BOLT-INFO: binary build-id is: 8bb7beda9bae10bc546eace62775dd2958a9c940
5864
PERF2BOLT: spawning perf job to read buildid list
5965
PERF2BOLT: matched build-id and file name
6066
PERF2BOLT: waiting for perf mmap events collection to finish...
@@ -63,13 +69,18 @@ PERF2BOLT: waiting for perf task events collection to finish...
6369
PERF2BOLT: parsing perf-script task events output
6470
PERF2BOLT: input binary is associated with 1 PID(s)
6571
PERF2BOLT: waiting for perf events collection to finish...
66-
PERF2BOLT: parsing basic events (without LBR)...
72+
PERF2BOLT: SPE branch events in LBR-format...
73+
PERF2BOLT: read 3592267 samples and 3046129 LBR entries
74+
PERF2BOLT: ignored samples: 0 (0.0%)
6775
PERF2BOLT: waiting for perf mem events collection to finish...
6876
PERF2BOLT: parsing memory events...
69-
PERF2BOLT: processing basic events (without LBR)...
70-
PERF2BOLT: read 79 samples
71-
PERF2BOLT: out of range samples recorded in unknown regions: 5 (6.3%)
72-
PERF2BOLT: wrote 14 objects and 0 memory objects to perf.fdata
77+
PERF2BOLT: processing branch events...
78+
PERF2BOLT: traces mismatching disassembled function contents: 0
79+
PERF2BOLT: out of range traces involving unknown regions: 0
80+
PERF2BOLT: wrote 21027 objects and 0 memory objects to perf.fdata
81+
BOLT-INFO: 2178 out of 72028 functions in the binary (3.0%) have non-empty execution profile
82+
BOLT-INFO: 12 functions with profile could not be optimized
83+
BOLT-INFO: Functions with density >= 0.0 account for 99.00% total sample counts
7384
```
7485

7586
### Run BOLT to generate the optimized executable
@@ -150,4 +161,4 @@ BOLT-INFO: setting __hot_end to 0x4002b0
150161
BOLT-INFO: patched build-id (flipped last bit)
151162
```
152163

153-
The optimized executable is now available as `new_executable`.
164+
The optimized executable is now available as `new_executable`.

0 commit comments

Comments
 (0)