Skip to content

Commit a6af641

Browse files
authored
[libc++] Update utilities to compare benchmarks (llvm#157556)
This patch replaces the previous `libcxx-compare-benchmarks` wrapper by a new `compare-benchmarks` script which works with LNT-compatible data. This allows comparing benchmark results across libc++ microbenchmarks, SPEC, and anything else that would produce LNT-compatible data. It also adds a simple script to consolidate LNT benchmark output into a single file, simplifying the process of doing A/B runs locally. The simplest way to do this doesn't require creating two build directories after this patch anymore. It also adds the ability to produce either a standalone HTML chart or a plain text output for diffing results locally when prototyping changes. Example text output of the new tool: ``` Benchmark Baseline Candidate Difference % Difference ----------------------------------- ---------- ----------- ------------ -------------- BM_join_view_deques/0 8.11 8.16 0.05 0.63 BM_join_view_deques/1 13.56 13.79 0.23 1.69 BM_join_view_deques/1024 6606.51 7011.34 404.83 6.13 BM_join_view_deques/2 17.99 19.92 1.93 10.72 BM_join_view_deques/4000 27655.58 29864.72 2209.14 7.99 BM_join_view_deques/4096 26218.07 30520.13 4302.05 16.41 BM_join_view_deques/512 3231.66 2832.47 -399.19 -12.35 BM_join_view_deques/5500 47144.82 42207.41 -4937.42 -10.47 BM_join_view_deques/64 247.23 262.66 15.43 6.24 BM_join_view_deques/64000 756221.63 511247.48 -244974.15 -32.39 BM_join_view_deques/65536 537110.91 560241.61 23130.70 4.31 BM_join_view_deques/70000 815739.07 616181.34 -199557.73 -24.46 BM_join_view_out_vectors/0 0.93 0.93 0.00 0.07 BM_join_view_out_vectors/1 3.11 3.14 0.03 0.82 BM_join_view_out_vectors/1024 3090.92 3563.29 472.37 15.28 BM_join_view_out_vectors/2 5.52 5.56 0.04 0.64 BM_join_view_out_vectors/4000 9887.21 9774.40 -112.82 -1.14 BM_join_view_out_vectors/4096 10158.78 10190.44 31.66 0.31 BM_join_view_out_vectors/512 1218.68 1209.59 -9.09 -0.75 BM_join_view_out_vectors/5500 13559.23 13676.06 116.84 0.86 BM_join_view_out_vectors/64 158.95 157.91 -1.04 -0.65 BM_join_view_out_vectors/64000 178514.73 226520.97 48006.24 26.89 BM_join_view_out_vectors/65536 184639.37 207180.35 22540.98 12.21 BM_join_view_out_vectors/70000 235006.69 213886.93 -21119.77 -8.99 ```
1 parent f9e5d39 commit a6af641

File tree

6 files changed

+191
-141
lines changed

6 files changed

+191
-141
lines changed

libcxx/docs/TestingLibcxx.rst

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
471471
Benchmarks
472472
==========
473473

474-
Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
474+
Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
475475
library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
476476
Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
477477

@@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
490490
the instructions for running individual tests.
491491

492492
If you want to compare the results of different benchmark runs, we recommend using the
493-
``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
494-
and run the benchmark:
493+
``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
494+
be installed with:
495495

496496
.. code-block:: bash
497497
498-
$ cmake -S runtimes -B <build1> [...]
499-
$ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
498+
$ python -m venv .venv && source .venv/bin/activate # Optional but recommended
499+
$ pip install -r libcxx/utils/requirements.txt
500500
501-
Then, do the same for the second configuration you want to test. Use a different build
502-
directory for that configuration:
501+
Once that's done, start by configuring CMake in a build directory and running one or
502+
more benchmarks, as usual:
503503

504504
.. code-block:: bash
505505
506-
$ cmake -S runtimes -B <build2> [...]
507-
$ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
506+
$ cmake -S runtimes -B <build> [...]
507+
$ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
508508
509-
Finally, use ``libcxx-compare-benchmarks`` to compare both:
509+
Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
510510

511511
.. code-block:: bash
512512
513-
$ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
513+
$ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
514+
515+
The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
516+
directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
517+
518+
.. code-block:: bash
519+
520+
$ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
521+
522+
Finally, use ``compare-benchmarks`` to compare both:
523+
524+
.. code-block:: bash
525+
526+
$ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
527+
528+
# Useful one-liner when iterating locally:
529+
$ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
530+
531+
The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
532+
differences in a browser window. Use ``compare-benchmarks --help`` for details.
514533

515534
.. _`Google Benchmark`: https://github.com/google/benchmark
516535

libcxx/utils/compare-benchmarks

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import re
5+
import statistics
6+
import sys
7+
8+
import plotly
9+
import tabulate
10+
11+
def parse_lnt(lines):
12+
"""
13+
Parse lines in LNT format and return a dictionnary of the form:
14+
15+
{
16+
'benchmark1': {
17+
'metric1': [float],
18+
'metric2': [float],
19+
...
20+
},
21+
'benchmark2': {
22+
'metric1': [float],
23+
'metric2': [float],
24+
...
25+
},
26+
...
27+
}
28+
29+
Each metric may have multiple values.
30+
"""
31+
results = {}
32+
for line in lines:
33+
line = line.strip()
34+
if not line:
35+
continue
36+
37+
(identifier, value) = line.split(' ')
38+
(name, metric) = identifier.split('.')
39+
if name not in results:
40+
results[name] = {}
41+
if metric not in results[name]:
42+
results[name][metric] = []
43+
results[name][metric].append(float(value))
44+
return results
45+
46+
def plain_text_comparison(benchmarks, baseline, candidate):
47+
"""
48+
Create a tabulated comparison of the baseline and the candidate.
49+
"""
50+
headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
51+
fmt = (None, '.2f', '.2f', '.2f', '.2f')
52+
table = []
53+
for (bm, base, cand) in zip(benchmarks, baseline, candidate):
54+
diff = (cand - base) if base and cand else None
55+
percent = 100 * (diff / base) if base and cand else None
56+
row = [bm, base, cand, diff, percent]
57+
table.append(row)
58+
return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
59+
60+
def create_chart(benchmarks, baseline, candidate):
61+
"""
62+
Create a bar chart comparing 'baseline' and 'candidate'.
63+
"""
64+
figure = plotly.graph_objects.Figure()
65+
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
66+
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
67+
return figure
68+
69+
def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
70+
"""
71+
Prepare the data for being formatted or displayed as a chart.
72+
73+
Metrics that have more than one value are aggregated using the given aggregation function.
74+
"""
75+
all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
76+
baseline_series = []
77+
candidate_series = []
78+
for bm in all_benchmarks:
79+
baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
80+
candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
81+
return (all_benchmarks, baseline_series, candidate_series)
82+
83+
def main(argv):
84+
parser = argparse.ArgumentParser(
85+
prog='compare-benchmarks',
86+
description='Compare the results of two sets of benchmarks in LNT format.',
87+
epilog='This script requires the `tabulate` and the `plotly` Python modules.')
88+
parser.add_argument('baseline', type=argparse.FileType('r'),
89+
help='Path to a LNT format file containing the benchmark results for the baseline.')
90+
parser.add_argument('candidate', type=argparse.FileType('r'),
91+
help='Path to a LNT format file containing the benchmark results for the candidate.')
92+
parser.add_argument('--metric', type=str, default='execution_time',
93+
help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
94+
'this option allows selecting which metric is being analyzed. The default is "execution_time".')
95+
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
96+
help='Path of a file where to output the resulting comparison. Default to stdout.')
97+
parser.add_argument('--filter', type=str, required=False,
98+
help='An optional regular expression used to filter the benchmarks included in the comparison. '
99+
'Only benchmarks whose names match the regular expression will be included.')
100+
parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
101+
help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
102+
'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
103+
args = parser.parse_args(argv)
104+
105+
baseline = parse_lnt(args.baseline.readlines())
106+
candidate = parse_lnt(args.candidate.readlines())
107+
108+
if args.filter is not None:
109+
regex = re.compile(args.filter)
110+
baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
111+
candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
112+
113+
(benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
114+
115+
if args.format == 'chart':
116+
figure = create_chart(benchmarks, baseline_series, candidate_series)
117+
plotly.io.write_html(figure, file=args.output)
118+
else:
119+
diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
120+
args.output.write(diff)
121+
122+
if __name__ == '__main__':
123+
main(sys.argv[1:])
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import pathlib
5+
import sys
6+
7+
def main(argv):
8+
parser = argparse.ArgumentParser(
9+
prog='consolidate-benchmarks',
10+
description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
11+
parser.add_argument('files_or_directories', type=str, nargs='+',
12+
help='Path to files or directories containing LNT data to consolidate. Directories are searched '
13+
'recursively for files with a .lnt extension.')
14+
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
15+
help='Where to output the result. Default to stdout.')
16+
args = parser.parse_args(argv)
17+
18+
files = []
19+
for arg in args.files_or_directories:
20+
path = pathlib.Path(arg)
21+
if path.is_dir():
22+
for p in path.rglob('*.lnt'):
23+
files.append(p)
24+
else:
25+
files.append(path)
26+
27+
for file in files:
28+
for line in file.open().readlines():
29+
line = line.strip()
30+
if not line:
31+
continue
32+
args.output.write(line)
33+
args.output.write('\n')
34+
35+
if __name__ == '__main__':
36+
main(sys.argv[1:])

libcxx/utils/libcxx-benchmark-json

Lines changed: 0 additions & 57 deletions
This file was deleted.

libcxx/utils/libcxx-compare-benchmarks

Lines changed: 0 additions & 73 deletions
This file was deleted.

libcxx/utils/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
plotly
2+
tabulate

0 commit comments

Comments
 (0)