@@ -2319,6 +2319,8 @@ are listed below.
23192319 on ELF targets when using the integrated assembler. This flag currently
23202320 only has an effect on ELF targets.
23212321
2322+ .. _funique_internal_linkage_names :
2323+
23222324.. option :: -f[no]-unique-internal-linkage-names
23232325
23242326 Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2448,27 +2450,41 @@ usual build cycle when using sample profilers for optimization:
24482450 usual build flags that you always build your application with. The only
24492451 requirement is that DWARF debug info including source line information is
24502452 generated. This DWARF information is important for the profiler to be able
2451- to map instructions back to source line locations.
2453+ to map instructions back to source line locations. The usefulness of this
2454+ DWARF information can be improved with the ``-fdebug-info-for-profiling ``
2455+ and ``-funique-internal-linkage-names `` options.
24522456
2453- On Linux, `` -g `` or just `` -gline-tables-only `` is sufficient :
2457+ On Linux:
24542458
24552459 .. code-block :: console
24562460
2457- $ clang++ -O2 -gline-tables-only code.cc -o code
2461+ $ clang++ -O2 -gline-tables-only \
2462+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2463+ code.cc -o code
24582464
24592465 While MSVC-style targets default to CodeView debug information, DWARF debug
24602466 information is required to generate source-level LLVM profiles. Use
24612467 ``-gdwarf `` to include DWARF debug information:
24622468
2463- .. code-block :: console
2469+ .. code-block :: winbatch
2470+
2471+ > clang-cl /O2 -gdwarf -gline-tables-only ^
2472+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2473+ code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2474+
2475+ .. note ::
24642476
2465- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
2477+ :ref: `-funique-internal-linkage-names <funique_internal_linkage_names >`
2478+ generates unique names based on given command-line source file paths. If
2479+ your build system uses absolute source paths and these paths may change
2480+ between steps 1 and 4, then the uniqued function names may change and result
2481+ in unused profile data. Consider omitting this option in such cases.
24662482
246724832. Run the executable under a sampling profiler. The specific profiler
24682484 you use does not really matter, as long as its output can be converted
24692485 into the format that the LLVM optimizer understands.
24702486
2471- Two such profilers are the the Linux Perf profiler
2487+ Two such profilers are the Linux Perf profiler
24722488 (https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
24732489 available as part of `Intel VTune
24742490 <https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html> `_.
@@ -2482,7 +2498,9 @@ usual build cycle when using sample profilers for optimization:
24822498
24832499 .. code-block :: console
24842500
2485- $ perf record -b ./code
2501+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
2502+
2503+ If the event above is unavailable, ``branches:u `` is probably next-best.
24862504
24872505 Note the use of the ``-b `` flag. This tells Perf to use the Last Branch
24882506 Record (LBR) to record call chains. While this is not strictly required,
@@ -2532,21 +2550,42 @@ usual build cycle when using sample profilers for optimization:
25322550 that executes faster than the original one. Note that you are not
25332551 required to build the code with the exact same arguments that you
25342552 used in the first step. The only requirement is that you build the code
2535- with ``-gline-tables-only `` and ``-fprofile-sample-use ``.
2553+ with the same debug info options and ``-fprofile-sample-use ``.
2554+
2555+ On Linux:
25362556
25372557 .. code-block :: console
25382558
2539- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
2559+ $ clang++ -O2 -gline-tables-only \
2560+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2561+ -fprofile-sample-use=code.prof code.cc -o code
25402562
2541- [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2542- edge counters. The profile inference algorithm (profi) can be used to infer
2543- missing blocks and edge counts, and improve the quality of profile data.
2544- Enable it with ``-fsample-profile-use-profi ``.
2563+ On Windows:
25452564
2546- .. code-block :: console
2565+ .. code-block :: winbatch
2566+
2567+ > clang-cl /O2 -gdwarf -gline-tables-only ^
2568+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2569+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
2570+
2571+ [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
2572+ edge counters. The profile inference algorithm (profi) can be used to infer
2573+ missing blocks and edge counts, and improve the quality of profile data.
2574+ Enable it with ``-fsample-profile-use-profi ``. For example, on Linux:
2575+
2576+ .. code-block :: console
2577+
2578+ $ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \
2579+ -fdebug-info-for-profiling -funique-internal-linkage-names \
2580+ -fprofile-sample-use=code.prof code.cc -o code
2581+
2582+ On Windows:
2583+
2584+ .. code-block :: winbatch
25472585
2548- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
2549- -fsample-profile-use-profi code.cc -o code
2586+ > clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^
2587+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
2588+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
25502589
25512590 Sample Profile Formats
25522591""""""""""""""""""""""
0 commit comments