Skip to content

Conversation

@shawbyoung
Copy link
Contributor

@shawbyoung shawbyoung commented Jul 24, 2024

Extended function and block pseudo probe stale matching

This change brings flexible profile recovery using pseudo probes:

  • using multiple profiles for one binary function, e.g. if the binary
    function inlines profile functions,
  • using parts of one profile across multiple binary functions, e.g. if
    the profile has more aggressive inlining compared to the input binary.

Prerequisites

  • YAML profile must be created with --profile-write-pseudo-probes
    perf2bolt flag.
  • Pseudo probe matching is enabled with
    --stale-matching-with-pseudo-probes which augments hash-based
    matching + profile inference enabled with --infer-stale-profile
    (required).
  • The binary must contain pseudo probes enabled with
    -fpseudo-probe-for-profiling Clang option.

Performance testing

  • Using two LTO+CSSPGO builds of Clang ([Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED #79942) from the same source
    with different profiles causing different inlining decisions,
  • Clang-BOLT testing harness https://github.com/aaupov/llvm-devmtg-2022
  • 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
    build,
  • Bootstrapped build of Clang, wall time:
    • no BOLT: 301.87 +- 1.44 seconds, 0.91 IPC
    • stale profile: 295.242 +- 0.683 seconds, 0.93 IPC
    • hash matching: 289.035 +- 0.656 seconds, 0.95 IPC
    • probe matching: 283.558 +- 0.927 seconds, 0.96 IPC
    • fresh profile: 275.131 +- 0.444 seconds, 0.99 IPC

Depends on:

Co-authored-by: [email protected]

Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
aaupov added a commit that referenced this pull request Sep 4, 2024
To be used for pseudo probe function matching (#100446).

Test Plan: updated pseudoprobe-decoding-inline.test

Pull Request: #107137
…BF, determine best matching binary inline subtrees, find corresponding binary functions, attach the profile to binary function without a profile and most probes

Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
@github-actions
Copy link

github-actions bot commented Nov 8, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.4
aaupov added a commit that referenced this pull request Nov 12, 2024
Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to binary probes via matched inline tree nodes. Each binary probe
has an associated binary basic block. If all probes from one profile
basic block map to the same binary basic block, it’s an exact match,
otherwise the block is determined by majority vote and reported as loose
match.

Pseudo probe matching happens between exact hash matching and call/loose
matching.

Introduce ProbeMatchSpec - a mechanism to match probes belonging to
another binary function. For example, given functions foo and bar:
```
void foo() {
  bar();
}
```
profiled binary: bar is not inlined => have top-level function bar
new binary where the profile is applied to: bar is inlined into foo.

Currently, BOLT does 1:1 matching between profile functions and binary
functions based on the name. #100446 will extend this to N:M where
multiple profiles can be matched to one binary function (as in the
example above where binary function foo would use profiles for foo and
bar), and one profile can be matched to multiple binary functions (e.g.
if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile
(existing name-based matching). 

Test Plan: Added match-blocks-with-pseudo-probes.test

Performance test:
- Setup:
  - Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO
  (#79942)
  - BOLT fresh: BOLTed Clang using fresh profile,
  - BOLT stale (hash): BOLTed Clang using stale profile (collected on
    Clang 10K commits back), `-infer-stale-profile` (hash+call block
    matching)
  - BOLT stale (+probe): BOLTed Clang using stale profile,
    `-infer-stale-profile` with `-stale-matching-with-pseudo-probes`
    (hash+call+pseudo probe block matching)
  - 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
    build,
  - BOLT profiles are collected on Clang compiling large preprocessed
    C++ file.
- Benchmark: building Clang (average of 5 runs), see driver in
  aaupov/llvm-devmtg-2022
- Results, wall time, lower is better:
  - Baseline no-BOLT: 429.52 +- 2.61s,
  - BOLT stale (hash): 413.21 +- 2.19s,
  - BOLT stale (+probe): 409.69 +- 1.41s,
  - BOLT fresh: 384.50 +- 1.80s.

---------

Co-authored-by: Amir Ayupov <[email protected]>
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.4
@aaupov aaupov marked this pull request as ready for review October 29, 2025 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants