Skip to content

Commit 448f9e3

Browse files
Merge pull request #2058 from paschalis-mpeis/bolt-spe-brstack
Update BOLT SPE documentation.
2 parents 1c0de35 + bd9e9e6 commit 448f9e3

File tree

2 files changed

+38
-32
lines changed

2 files changed

+38
-32
lines changed

content/learning-paths/servers-and-cloud-computing/bolt/_index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ minutes_to_complete: 30
55

66
who_is_this_for: This is an introductory topic for software developers who want to learn how to use BOLT on an Arm executable.
77

8-
learning_objectives:
9-
- Build an application which is ready to be optimized by BOLT
8+
learning_objectives:
9+
- Build an application which is ready to be optimized by BOLT
1010
- Profile an application and collect performance information
11-
- Run BOLT to create an optimized executable
11+
- Run BOLT to create an optimized executable
1212

1313
prerequisites:
14-
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available.
14+
- An Arm based system running Linux with [BOLT](/install-guides/bolt/) and [Linux Perf](/install-guides/perf/) installed. The Linux kernel should be version 5.15 or later. Earlier kernel versions can be used, but some Linux Perf features may be limited or not available. For [SPE](./bolt-spe) the version should be 6.14 or later.
1515
- (Optional) A second, more powerful Linux system to build the software executable and run BOLT.
1616

1717
author: Jonathan Davies

content/learning-paths/servers-and-cloud-computing/bolt/bolt-spe.md

Lines changed: 34 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,60 @@ layout: learningpathall
77
---
88

99
## BOLT with SPE
10+
The steps to use BOLT with Perf SPE are listed below.
1011

11-
{{% notice Important Note %}}
12-
Currently, BOLT may not generate a faster binary when using Perf SPE due to limitations within `perf` and BOLT itself.
13-
For more information and the latest updates see: [[AArch64] BOLT does not support SPE branch data](https://github.com/llvm/llvm-project/issues/115333).
14-
{{% /notice %}}
12+
### Collect Perf data with SPE
1513

16-
The steps to use BOLT with Perf SPE are listed below.
14+
First, make sure you are using Linux Perf version v6.14 or later, which supports the 'brstack' field and captures all branch information.
1715

18-
### Collect Perf data with SPE
16+
```bash { output_lines = "2" }
17+
perf --version
18+
perf version 6.14
19+
```
1920

20-
Run your executable in the normal use case and collect a SPE performance profile. This will output a `perf.data` file containing the profile and will be used to optimize the executable.
21+
Next, run your executable in the normal use case to collect an SPE performance profile. This generates a `perf.data` file containing the profile, which will be used to optimize the executable.
2122

22-
Record samples while running your application. Substitute the actual name of your application for `executable`:
23+
Record samples while running your application, replacing `executable` below:
2324

2425
```bash { target="ubuntu:latest" }
25-
perf record -e arm_spe/branch_filter=1/u -o perf.data-- ./executable
26+
perf record -e 'arm_spe/branch_filter=1/u' -o perf.data -- ./executable
2627
```
28+
Once the execution is complete, perf will print a summary that includes the size of the `perf.data` file:
2729

28-
Perf prints the size of the `perf.data` file:
29-
30-
```output
30+
```bash { target="ubuntu:latest" }
3131
[ perf record: Woken up 79 times to write data ]
3232
[ perf record: Captured and wrote 4.910 MB perf.data ]
3333
```
3434

35-
### Convert the Profile into BOLT format
35+
The `-jitter=1` flag can help avoid resonance, while `-c`/`--count` controls the sampling period.
36+
37+
### Convert the Profile to BOLT format
3638

37-
`perf2bolt` converts the profile into a BOLT data format. For the given sample data, `perf2bolt` finds all instruction pointers in the profile, maps them back to the assembly instructions, and outputs a count of how many times each assembly instruction was sampled.
39+
`perf2bolt` converts the profile into BOLT's data format. It maps branch events from the profile to assembly instructions and outputs branch execution traces with sample counts.
3840

39-
If you application is named `executable`, run the commend below to convert the profile data:
41+
If you application is named `executable`, run the command below to convert the profile data:
4042

4143
```bash { target="ubuntu:latest" }
42-
perf2bolt -p perf.data -o perf.fdata -nl ./executable
44+
perf2bolt -p perf.data -o perf.fdata --spe ./executable
4345
```
4446

45-
Below is example output from `perf2bolt`, it has read all samples and created the file `perf.fdata`.
47+
Below is example output from `perf2bolt`, it has read all samples from `perf.data` and created the converted profile `perf.fdata`.
4648

4749
```output
4850
BOLT-INFO: shared object or position-independent executable detected
4951
PERF2BOLT: Starting data aggregation job for perf.data
50-
PERF2BOLT: spawning perf job to read events without LBR
52+
PERF2BOLT: spawning perf job to read SPE brstack events
5153
PERF2BOLT: spawning perf job to read mem events
5254
PERF2BOLT: spawning perf job to read process events
5355
PERF2BOLT: spawning perf job to read task events
5456
BOLT-INFO: Target architecture: aarch64
55-
BOLT-INFO: BOLT version: c66c15a76dc7b021c29479a54aa1785928e9d1bf
57+
BOLT-INFO: BOLT version: b1516a9d688fed835dce5efc614302649c3baf0e
5658
BOLT-INFO: first alloc address is 0x0
57-
BOLT-INFO: creating new program header table at address 0x200000, offset 0x200000
59+
BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4600000
5860
BOLT-INFO: enabling relocation mode
59-
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
6061
BOLT-INFO: enabling strict relocation mode for aggregation purposes
6162
BOLT-INFO: pre-processing profile using perf data aggregator
62-
BOLT-INFO: binary build-id is: 21dbca691155f1e57825e6381d727842f3d43039
63+
BOLT-INFO: binary build-id is: 8bb7beda9bae10bc546eace62775dd2958a9c940
6364
PERF2BOLT: spawning perf job to read buildid list
6465
PERF2BOLT: matched build-id and file name
6566
PERF2BOLT: waiting for perf mmap events collection to finish...
@@ -68,13 +69,18 @@ PERF2BOLT: waiting for perf task events collection to finish...
6869
PERF2BOLT: parsing perf-script task events output
6970
PERF2BOLT: input binary is associated with 1 PID(s)
7071
PERF2BOLT: waiting for perf events collection to finish...
71-
PERF2BOLT: parsing basic events (without LBR)...
72+
PERF2BOLT: SPE branch events in LBR-format...
73+
PERF2BOLT: read 3592267 samples and 3046129 LBR entries
74+
PERF2BOLT: ignored samples: 0 (0.0%)
7275
PERF2BOLT: waiting for perf mem events collection to finish...
7376
PERF2BOLT: parsing memory events...
74-
PERF2BOLT: processing basic events (without LBR)...
75-
PERF2BOLT: read 79 samples
76-
PERF2BOLT: out of range samples recorded in unknown regions: 5 (6.3%)
77-
PERF2BOLT: wrote 14 objects and 0 memory objects to perf.fdata
77+
PERF2BOLT: processing branch events...
78+
PERF2BOLT: traces mismatching disassembled function contents: 0
79+
PERF2BOLT: out of range traces involving unknown regions: 0
80+
PERF2BOLT: wrote 21027 objects and 0 memory objects to perf.fdata
81+
BOLT-INFO: 2178 out of 72028 functions in the binary (3.0%) have non-empty execution profile
82+
BOLT-INFO: 12 functions with profile could not be optimized
83+
BOLT-INFO: Functions with density >= 0.0 account for 99.00% total sample counts
7884
```
7985

8086
### Run BOLT to generate the optimized executable
@@ -155,4 +161,4 @@ BOLT-INFO: setting __hot_end to 0x4002b0
155161
BOLT-INFO: patched build-id (flipped last bit)
156162
```
157163

158-
The optimized executable is now available as `new_executable`.
164+
The optimized executable is now available as `new_executable`.

0 commit comments

Comments
 (0)