Skip to content

Commit 55e2a50

Browse files
[add] Added documentation about on-cpu profiling of local benchmarks (#157)
1 parent 9d485e8 commit 55e2a50

File tree

5 files changed

+85
-2
lines changed

5 files changed

+85
-2
lines changed

docs/Readme.md

Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,4 +92,86 @@ Some considerations:
9292
a standalone redis-server, copy the dataset and module files to the DB VM and make usage of the tool to run the query variations.
9393
- After each benchmark the defined KPIs limits are checked and will influence the exit code of the runner script. Even if we fail a benchmark variation, all other benchmark definitions are run.
9494
- At the end of each benchmark an output json file is stored with this benchmarks folder and will be named like `<start time>-<deployment type>-<git org>-<git repo>-<git branch>-<test name>-<git sha>.json`
95-
- In the case of a uncaught exception after we've deployed the environment the benchmark script will always try to teardown the created environment.
95+
- In the case of a uncaught exception after we've deployed the environment the benchmark script will always try to teardown the created environment.
96+
97+
# Attaching profiling tools/probers ( perf (a.k.a. perf_events), bpf tooling, vtune ) while running local benchmarks
98+
99+
**Note:** This part of the guide is only valid for Linux based machines,
100+
and requires at least perf( and ideally pprof + perf_to_profile + graphviz ).
101+
102+
As soon we enable bpf tooling automation this will be valid for darwin based systems as well ( MacOs ), **so sit tight!**
103+
104+
While running benchmarks locally you attach profilers ( currently only related on on-cpu time ) to understand exactly what are the functions that that more cpu cycles to complete.
105+
Currently, the benchmark automation supports two profilers:
106+
- perf perf (a.k.a. perf_events), enabled by default and with the profiler key: `perf:record`
107+
- Intel (R) Vtune (TM) (`vtune`) with the profiler key: `vtune`
108+
109+
## Trigger a profile
110+
To trigger a profile while running a benchmark you just need to add `PROFILE=1` to the previous env variables.
111+
112+
Here's an example within RedisTimeSeries project:
113+
```
114+
make benchmark PROFILE=1 BENCHMARK=tsbs-scale100_single-groupby-1-1-1.yml
115+
```
116+
117+
Depending on the used profiler you will get:
118+
- 1) Table of Top CPU entries in text form for the profiled process(es)
119+
120+
![Table of Top CPU entries](top-entries-table.png)
121+
122+
- 2) Call graph identifying the top hotspots
123+
124+
![Call graph identifying the top hotspots ](call-graph-sample.png)
125+
126+
- 3) FlameGraph – convert profiling data to a flamegraph
127+
128+
![Flame graph ](flame-graph-sample.png)
129+
130+
131+
If you run the benchmark automation without specifying a benchmark ( example : `make benchmark PROFILE=1` )
132+
the automation will trigger all benchmarks and consequently profile each individual one.
133+
134+
At the end, you should have an artifact table like the following:
135+
136+
```bash
137+
# Profiler artifacts
138+
| Test Case | Profiler | Artifact | Local file |s3 link|
139+
|------------------------|-----------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
140+
|tsbs-scale100_high-cpu-1|perf:record|Flame Graph (primary 1 of 1) |/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out.flamegraph.svg | |
141+
|tsbs-scale100_high-cpu-1|perf:record|perf output (primary 1 of 1) |/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out | |
142+
|tsbs-scale100_high-cpu-1|perf:record|perf report top self-cpu (primary 1 of 1) |/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out.perf-report.top-cpu.txt | |
143+
|tsbs-scale100_high-cpu-1|perf:record|perf report top self-cpu (dso=/home/fco/redislabs/RedisTimeSeries/bin/linux-x64-release-profile/redistimeseries.so)|/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out.perf-report.top-cpu.dso.txt| |
144+
|tsbs-scale100_high-cpu-1|perf:record|Top entries in text form |/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out.pprof.txt | |
145+
|tsbs-scale100_high-cpu-1|perf:record|Output graph image in PNG format |/home/fco/redislabs/RedisTimeSeries/tests/benchmarks/profile_oss-standalone__primary-1-of-1__tsbs-scale100_high-cpu-1_perf:record_2021-07-14-10-04-38.out.pprof.png | |
146+
```
147+
148+
##
149+
150+
### Further notes on using perf (a.k.a. perf_events) in non-root user
151+
152+
If running in non-root user please confirm that you have:
153+
- **access to Kernel address maps**.
154+
155+
Check if `0` ( disabled ) appears from the output of `cat /proc/sys/kernel/kptr_restrict`
156+
157+
If not then fix via: `sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"`
158+
159+
160+
- **permission to collect stats**.
161+
162+
Check if `-1` appears from the output of `cat /proc/sys/kernel/perf_event_paranoid`
163+
164+
If not then fix via: `sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'`
165+
166+
### Further note on profiling in Rust
167+
168+
Due to Rust symbol (de)mangling still being unstable we're not able to properly
169+
demangle symbols if we use perf default's `fp` (frame-pointer based walking on the stack to understand for a sample).
170+
171+
Therefore, when profiling Rust you need to set the env variable `PERF_CALLGRAPH_MODE` to `dwarf`. Further notes on the different perf
172+
`call-graph` modes [here](https://stackoverflow.com/a/57432063).
173+
174+
Here's an example of RedisJson profile run:
175+
```bash
176+
make build benchmark PROFILE=1 BENCHMARK=json_get_array_of_docs[1]sclr_pass_100_json.yml PERF_CALLGRAPH_MODE=dwarf
177+
```

docs/call-graph-sample.png

195 KB
Loading

docs/flame-graph-sample.png

99.4 KB
Loading

docs/top-entries-table.png

344 KB
Loading

redisbench_admin/profilers/profilers.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66

77
import pkg_resources
88

9-
ALLOWED_PROFILERS = "perf:record,ebpf:oncpu,ebpf:offcpu,vtune"
9+
# ALLOWED_PROFILERS = "perf:record,ebpf:oncpu,ebpf:offcpu,vtune"
10+
ALLOWED_PROFILERS = "perf:record,vtune"
1011
PROFILERS_DEFAULT = "perf:record"
1112
PROFILE_FREQ_DEFAULT = "99"
1213

0 commit comments

Comments
 (0)