-
Notifications
You must be signed in to change notification settings - Fork 21
Profiling
pprof helps diagnose which functions use most of the CPU resources.
Note that this works better by compiling lib with a test file rather than linking against it.
- compile with
-gand link with-lprofiler - run with
LDPRELOAD=/usr/lib/libprofiler.so CPUPROFILE=cpu.profile ./bin - run
google-pprof --text cpu.profile ./bin(switch to --gv, --pdf, --svg, ... depending on what output you prefer)
cachegrind helps diagnose data & instructions cache misses.
Note that this works better by compiling lib with a test file rather than linking against it.
- compile with
-g - run
valgrind --tool=cachegrind ./bin - run
cg_annotate cachegrind.out.xxxxto get a global report by file - run
cg_annotate cachegrind.out.xxxx /abs/path/to/file.cppto get a line-by-line report for a specific file - run
cg_annotate cachegrind.out.xxxx --auto=yesto get a full line-by-line report for all files


-
I: Instruction Cache -
I1: L1 Instruction Cache -
LLi: Last Level Instruction Cache (L2 on valgrind, simulates L3/L4/... basically last level for machines with more than 2 cache levels). -
D: Data cache -
D1: L1 Data cache -
LLd: Last Level Data Cache (same as LLi) -
r: read -
w: write -
mr: miss on read -
mw: miss on write -
Dmr,D1mr,LLdmr, ... combination of cache type and action type (e.g:D1mris miss on read for L1 Data cache)
Misses are easy to spot, especially on the line-by-line detail.
If one line has a high rate of data misses, it may be interesting to reorganize the way the data is accessed to take profit of cache locality.
Not sure how to address instructions misses.
callgrind helps diagnose which instructions are executed the most.
Note that this works better by compiling lib with a test file rather than linking against it.
- compile with
-g - run
valgrind --tool=callgrind ./bin - run
callgrind_annotate callgrind.out.xxxxto get a global report by file - run
callgrind_annotate callgrind.out.xxxx ./relative/path/to/file.cppto get a line-by-line report for a specific file - run
callgrind_annotate callgrind.out.xxxx --auto=yesto get a full line-by-line report for all files


Interpretation is straightforward here: it simply reports how many time a given instruction has been run.
Optimizing highly called areas or reducing the number of times they are executed is the right way to address it (if relevant of course: some instructions are called very frequently but have a slow impact on the overall runtime, so it should be combined with pprof report).
kcachegrind provides is an alternative to cg_annotate and callgrind_annotate to navigate through the reports. It provides a graphic interface that can be much more convenient depending on the context.
- run
kcachegrind callgrind.out.xxxorkcachegrind cachegrind.out.xxx
Top-down approach can be interesting to focus on the relevant areas.
Typically:
- identify slow extended APIs based on google tests runtime.
- identify slow components with
pprof - once identified, find root cause using
cachegrindandcallgrind
Need more information? Open an issue.

