[libc++] Update utilities to compare benchmarks (llvm#157556)

ldionne · web-flow · commit a6af641b89d6 · 2025-09-09T10:52:44.000-04:00
This patch replaces the previous `libcxx-compare-benchmarks` wrapper by
a new `compare-benchmarks` script which works with LNT-compatible data.
This allows comparing benchmark results across libc++ microbenchmarks,
SPEC, and anything else that would produce LNT-compatible data.

It also adds a simple script to consolidate LNT benchmark output into a
single file, simplifying the process of doing A/B runs locally. The
simplest way to do this doesn't require creating two build directories
after this patch anymore.

It also adds the ability to produce either a standalone HTML chart or a
plain text output for diffing results locally when prototyping changes.
Example text output of the new tool:

```
Benchmark                              Baseline    Candidate    Difference    % Difference
-----------------------------------  ----------  -----------  ------------  --------------
BM_join_view_deques/0                      8.11         8.16          0.05            0.63
BM_join_view_deques/1                     13.56        13.79          0.23            1.69
BM_join_view_deques/1024                6606.51      7011.34        404.83            6.13
BM_join_view_deques/2                     17.99        19.92          1.93           10.72
BM_join_view_deques/4000               27655.58     29864.72       2209.14            7.99
BM_join_view_deques/4096               26218.07     30520.13       4302.05           16.41
BM_join_view_deques/512                 3231.66      2832.47       -399.19          -12.35
BM_join_view_deques/5500               47144.82     42207.41      -4937.42          -10.47
BM_join_view_deques/64                   247.23       262.66         15.43            6.24
BM_join_view_deques/64000             756221.63    511247.48    -244974.15          -32.39
BM_join_view_deques/65536             537110.91    560241.61      23130.70            4.31
BM_join_view_deques/70000             815739.07    616181.34    -199557.73          -24.46
BM_join_view_out_vectors/0                 0.93         0.93          0.00            0.07
BM_join_view_out_vectors/1                 3.11         3.14          0.03            0.82
BM_join_view_out_vectors/1024           3090.92      3563.29        472.37           15.28
BM_join_view_out_vectors/2                 5.52         5.56          0.04            0.64
BM_join_view_out_vectors/4000           9887.21      9774.40       -112.82           -1.14
BM_join_view_out_vectors/4096          10158.78     10190.44         31.66            0.31
BM_join_view_out_vectors/512            1218.68      1209.59         -9.09           -0.75
BM_join_view_out_vectors/5500          13559.23     13676.06        116.84            0.86
BM_join_view_out_vectors/64              158.95       157.91         -1.04           -0.65
BM_join_view_out_vectors/64000        178514.73    226520.97      48006.24           26.89
BM_join_view_out_vectors/65536        184639.37    207180.35      22540.98           12.21
BM_join_view_out_vectors/70000        235006.69    213886.93     -21119.77           -8.99
```
diff --git a/libcxx/docs/TestingLibcxx.rst b/libcxx/docs/TestingLibcxx.rst
@@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
 Benchmarks
 ==========
 
-Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
+Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
 library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
 Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
 
@@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
 the instructions for running individual tests.
 
 If you want to compare the results of different benchmark runs, we recommend using the
-``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
-and run the benchmark:
+``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
+be installed with:
 
 .. code-block:: bash
 
-  $ cmake -S runtimes -B <build1> [...]
-  $ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ python -m venv .venv && source .venv/bin/activate # Optional but recommended
+  $ pip install -r libcxx/utils/requirements.txt
 
-Then, do the same for the second configuration you want to test. Use a different build
-directory for that configuration:
+Once that's done, start by configuring CMake in a build directory and running one or
+more benchmarks, as usual:
 
 .. code-block:: bash
 
-  $ cmake -S runtimes -B <build2> [...]
-  $ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ cmake -S runtimes -B <build> [...]
+  $ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
 
-Finally, use ``libcxx-compare-benchmarks`` to compare both:
+Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
 
 .. code-block:: bash
 
-  $ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
+  $ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
+
+The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
+directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
+
+.. code-block:: bash
+
+  $ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
+
+Finally, use ``compare-benchmarks`` to compare both:
+
+.. code-block:: bash
+
+  $ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
+
+  # Useful one-liner when iterating locally:
+  $ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
+
+The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
+differences in a browser window. Use ``compare-benchmarks --help`` for details.
 
 .. _`Google Benchmark`: https://github.com/google/benchmark
 
diff --git a/libcxx/utils/compare-benchmarks b/libcxx/utils/compare-benchmarks
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import statistics
+import sys
+
+import plotly
+import tabulate
+
+def parse_lnt(lines):
+    """
+    Parse lines in LNT format and return a dictionnary of the form:
+
+        {
+            'benchmark1': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            'benchmark2': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            ...
+        }
+
+    Each metric may have multiple values.
+    """
+    results = {}
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+
+        (identifier, value) = line.split(' ')
+        (name, metric) = identifier.split('.')
+        if name not in results:
+            results[name] = {}
+        if metric not in results[name]:
+            results[name][metric] = []
+        results[name][metric].append(float(value))
+    return results
+
+def plain_text_comparison(benchmarks, baseline, candidate):
+    """
+    Create a tabulated comparison of the baseline and the candidate.
+    """
+    headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
+    fmt = (None, '.2f', '.2f', '.2f', '.2f')
+    table = []
+    for (bm, base, cand) in zip(benchmarks, baseline, candidate):
+        diff = (cand - base) if base and cand else None
+        percent = 100 * (diff / base) if base and cand else None
+        row = [bm, base, cand, diff, percent]
+        table.append(row)
+    return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
+
+def create_chart(benchmarks, baseline, candidate):
+    """
+    Create a bar chart comparing 'baseline' and 'candidate'.
+    """
+    figure = plotly.graph_objects.Figure()
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
+    return figure
+
+def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
+    """
+    Prepare the data for being formatted or displayed as a chart.
+
+    Metrics that have more than one value are aggregated using the given aggregation function.
+    """
+    all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
+    baseline_series = []
+    candidate_series = []
+    for bm in all_benchmarks:
+        baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
+        candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
+    return (all_benchmarks, baseline_series, candidate_series)
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='compare-benchmarks',
+        description='Compare the results of two sets of benchmarks in LNT format.',
+        epilog='This script requires the `tabulate` and the `plotly` Python modules.')
+    parser.add_argument('baseline', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the baseline.')
+    parser.add_argument('candidate', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the candidate.')
+    parser.add_argument('--metric', type=str, default='execution_time',
+        help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
+             'this option allows selecting which metric is being analyzed. The default is "execution_time".')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Path of a file where to output the resulting comparison. Default to stdout.')
+    parser.add_argument('--filter', type=str, required=False,
+        help='An optional regular expression used to filter the benchmarks included in the comparison. '
+             'Only benchmarks whose names match the regular expression will be included.')
+    parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
+        help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
+             'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
+    args = parser.parse_args(argv)
+
+    baseline = parse_lnt(args.baseline.readlines())
+    candidate = parse_lnt(args.candidate.readlines())
+
+    if args.filter is not None:
+        regex = re.compile(args.filter)
+        baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
+        candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
+
+    (benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
+
+    if args.format == 'chart':
+        figure = create_chart(benchmarks, baseline_series, candidate_series)
+        plotly.io.write_html(figure, file=args.output)
+    else:
+        diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
+        args.output.write(diff)
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
diff --git a/libcxx/utils/consolidate-benchmarks b/libcxx/utils/consolidate-benchmarks
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+
+import argparse
+import pathlib
+import sys
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='consolidate-benchmarks',
+        description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
+    parser.add_argument('files_or_directories', type=str, nargs='+',
+        help='Path to files or directories containing LNT data to consolidate. Directories are searched '
+             'recursively for files with a .lnt extension.')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Where to output the result. Default to stdout.')
+    args = parser.parse_args(argv)
+
+    files = []
+    for arg in args.files_or_directories:
+        path = pathlib.Path(arg)
+        if path.is_dir():
+            for p in path.rglob('*.lnt'):
+                files.append(p)
+        else:
+            files.append(path)
+
+    for file in files:
+        for line in file.open().readlines():
+            line = line.strip()
+            if not line:
+                continue
+            args.output.write(line)
+            args.output.write('\n')
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
diff --git a/libcxx/utils/libcxx-benchmark-json b/libcxx/utils/libcxx-benchmark-json
diff --git a/libcxx/utils/libcxx-compare-benchmarks b/libcxx/utils/libcxx-compare-benchmarks
diff --git a/libcxx/utils/requirements.txt b/libcxx/utils/requirements.txt
@@ -0,0 +1,2 @@
+plotly
+tabulate