Skip to content

Commit 837f89a

Browse files
feat: add Radar setup (#346)
This PR adds a Radar setup so we can track the performance of doc-gen4 builds. The benchmarks are: * `own-docs` measures the time and maxrss for building the docs for doc-gen itself. Here, it is a convenient stand-in for a moderately-sized Lean project with a moderate number of depenencies. * `mathlib-docs` measures the time and maxrss for building Mathlib's docs. We can use it to ensure that from-scratch Mathlib documentation builds don't get slower.
1 parent 5b5f81c commit 837f89a

File tree

9 files changed

+425
-0
lines changed

9 files changed

+425
-0
lines changed

scripts/bench/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# doc-gen4 benchmark suite
2+
3+
This directory contains the `doc-gen4` benchmark suite.
4+
It is built around [radar](github.com/leanprover/radar)
5+
and benchmark results can be viewed
6+
on the [Lean FRO radar instance](https://radar.lean-lang.org/repos/doc-gen4).
7+
8+
To execute the entire suite, run `scripts/bench/run` in the repo root.
9+
To execute an individual benchmark, run `scripts/bench/<benchmark>/run` in the repo root.
10+
All scripts output their measurements into the file `measurements.jsonl`.
11+
12+
Radar sums any duplicated measurements with matching metrics.
13+
To post-process the `measurements.jsonl` file this way in-place,
14+
run `scripts/bench/combine.py` in the repo root after executing the benchmark suite.
15+
16+
The `*.py` symlinks exist only so the python files are a bit nicer to edit
17+
in text editors that rely on the file ending.
18+
19+
## Adding a benchmark
20+
21+
To add a benchmark to the suite, follow these steps:
22+
23+
1. Create a new folder containing a `run` script and a `README.md` file describing the benchmark,
24+
as well as any other files required for the benchmark.
25+
2. Edit `scripts/bench/run` to call the `run` script of your new benchmark.
26+
27+
## How radar executes the benchmark suite
28+
29+
Radar requires a _bench repo_ to be configured for each repo.
30+
The bench repo contains scripts that execute benchmarks and present the results to radar
31+
following the
32+
[bench repo specification](https://github.com/leanprover/radar/blob/62bffab39025a1c2039499ae7a85b1ad446286d9/README.md#bench-repo-specification).
33+
34+
The bench repo for `doc-gen4` is
35+
[leanprover/radar-bench-doc-gen4](https://github.com/leanprover/radar-bench-mathlib4).
36+
It calls the bench suite inside the mathlib repository and passes the results on to radar.
37+
It expects all measurements to be presented through the `measurements.jsonl` file,
38+
_not_ through stdout/stderr (even though this would be allowed by the bench script specification).

scripts/bench/combine.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import json
5+
from pathlib import Path
6+
7+
OUTFILE = Path() / "measurements.jsonl"
8+
9+
if __name__ == "__main__":
10+
parser = argparse.ArgumentParser(
11+
description=f"Combine duplicated measurements in {OUTFILE.name} the way radar does, by summing their values."
12+
)
13+
args = parser.parse_args()
14+
15+
values: dict[str, float] = {}
16+
units: dict[str, str | None] = {}
17+
18+
with open(OUTFILE, "r") as f:
19+
for line in f:
20+
data = json.loads(line)
21+
metric = data["metric"]
22+
values[metric] = values.get(metric, 0) + data["value"]
23+
units[metric] = data.get("unit")
24+
25+
with open(OUTFILE, "w") as f:
26+
for metric, value in values.items():
27+
unit = units.get(metric)
28+
data = {"metric": metric, "value": value}
29+
if unit is not None:
30+
data["unit"] = unit
31+
f.write(f"{json.dumps(data)}\n")
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# The `mathlib-docs` benchmark
2+
3+
This benchmark measures the time taken for `doc-gen4` to generate
4+
documentation for Mathlib.

scripts/bench/mathlib-docs/run

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#!/usr/bin/env bash
2+
set -euxo pipefail
3+
4+
BENCH="scripts/bench"
5+
REPO_ROOT="$(pwd)"
6+
7+
# Look up the Lean toolchain version from lean-toolchain
8+
TOOLCHAIN_FULL=$(cat "$REPO_ROOT/lean-toolchain")
9+
TOOLCHAIN=${TOOLCHAIN_FULL#*:}
10+
11+
# Validate that it's a release or RC version (e.g., 4.27.0, v4.27.0, 4.27.0-rc1, v4.27.0-rc1)
12+
if [[ ! "$TOOLCHAIN" =~ ^v?[0-9]+\.[0-9]+\.[0-9]+(-rc[0-9]+)?$ ]]; then
13+
echo "Error: Toolchain '$TOOLCHAIN' is not a Lean release or RC version" >&2
14+
exit 1
15+
fi
16+
17+
# Create a temp directory and set up cleanup
18+
TMPDIR=$(mktemp -d)
19+
trap 'rm -rf "$TMPDIR"' EXIT
20+
21+
pushd "$TMPDIR"
22+
23+
# Create a new Mathlib project using the math-lax template
24+
lake +"$TOOLCHAIN" new mathproject math-lax
25+
26+
cd mathproject
27+
28+
# Add a dependency to the doc-gen4 checkout
29+
cat >> lakefile.toml <<EOF
30+
31+
[[require]]
32+
name = "doc-gen4"
33+
path = "$REPO_ROOT"
34+
EOF
35+
36+
# Update doc-gen4 dependency
37+
MATHLIB_NO_CACHE_ON_UPDATE=1 lake update doc-gen4
38+
39+
# Get Mathlib cache
40+
lake exe cache get
41+
42+
# Build DocGen4 and its executable first (we want to measure docs generation, not tool building)
43+
lake build DocGen4
44+
lake build doc-gen4
45+
46+
popd
47+
48+
# Benchmark documentation generation
49+
env DOCGEN_SRC="file" "$REPO_ROOT/$BENCH/measure.py" -t mathlib-docs -m instructions -m maxrss -m task-clock -m wall-clock -- \
50+
lake --dir "$TMPDIR/mathproject" build Mathlib:docs

scripts/bench/measure.py

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
#!/usr/bin/env python3
2+
3+
import argparse
4+
import json
5+
import os
6+
import resource
7+
import subprocess
8+
import sys
9+
import tempfile
10+
from dataclasses import dataclass
11+
from pathlib import Path
12+
13+
OUTFILE = Path() / "measurements.jsonl"
14+
15+
16+
@dataclass
17+
class PerfMetric:
18+
event: str
19+
factor: float = 1
20+
unit: str | None = None
21+
22+
23+
@dataclass
24+
class RusageMetric:
25+
name: str
26+
factor: float = 1
27+
unit: str | None = None
28+
29+
30+
PERF_METRICS = {
31+
"task-clock": PerfMetric("task-clock", factor=1e-9, unit="s"),
32+
"wall-clock": PerfMetric("duration_time", factor=1e-9, unit="s"),
33+
"instructions": PerfMetric("instructions"),
34+
}
35+
36+
PERF_UNITS = {
37+
"msec": 1e-3,
38+
"ns": 1e-9,
39+
}
40+
41+
RUSAGE_METRICS = {
42+
"maxrss": RusageMetric("ru_maxrss", factor=1000, unit="B"), # KiB on linux
43+
}
44+
45+
ALL_METRICS = {**PERF_METRICS, **RUSAGE_METRICS}
46+
47+
48+
def measure_perf(cmd: list[str], events: list[str]) -> dict[str, tuple[float, str]]:
49+
with tempfile.NamedTemporaryFile() as tmp:
50+
cmd = [
51+
*["perf", "stat", "-j", "-o", tmp.name],
52+
*[arg for event in events for arg in ["-e", event]],
53+
*["--", *cmd],
54+
]
55+
56+
# Execute command
57+
env = os.environ.copy()
58+
env["LC_ALL"] = "C" # or else perf may output syntactically invalid json
59+
result = subprocess.run(cmd, env=env)
60+
if result.returncode != 0:
61+
sys.exit(result.returncode)
62+
63+
# Collect results
64+
perf = {}
65+
for line in tmp:
66+
data = json.loads(line)
67+
if "event" in data and "counter-value" in data:
68+
perf[data["event"]] = float(data["counter-value"]), data["unit"]
69+
70+
return perf
71+
72+
73+
@dataclass
74+
class Result:
75+
category: str
76+
value: float
77+
unit: str | None
78+
79+
def fmt(self, topic: str) -> str:
80+
metric = f"{topic}//{self.category}"
81+
if self.unit is None:
82+
return json.dumps({"metric": metric, "value": self.value})
83+
return json.dumps({"metric": metric, "value": self.value, "unit": self.unit})
84+
85+
86+
def measure(cmd: list[str], metrics: list[str]) -> list[Result]:
87+
# Check args
88+
unknown_metrics = []
89+
for metric in metrics:
90+
if metric not in RUSAGE_METRICS and metric not in PERF_METRICS:
91+
unknown_metrics.append(metric)
92+
if unknown_metrics:
93+
raise Exception(f"unknown metrics: {', '.join(unknown_metrics)}")
94+
95+
# Prepare perf events
96+
events: list[str] = []
97+
for metric in metrics:
98+
if info := PERF_METRICS.get(metric):
99+
events.append(info.event)
100+
101+
# Measure
102+
perf = measure_perf(cmd, events)
103+
rusage = resource.getrusage(resource.RUSAGE_CHILDREN)
104+
105+
# Extract results
106+
results = []
107+
for metric in metrics:
108+
if info := PERF_METRICS.get(metric):
109+
if info.event in perf:
110+
value, unit = perf[info.event]
111+
else:
112+
# Without the corresponding permissions,
113+
# we only get access to the userspace versions of the counters.
114+
value, unit = perf[f"{info.event}:u"]
115+
116+
value *= PERF_UNITS.get(unit, info.factor)
117+
results.append(Result(metric, value, info.unit))
118+
119+
if info := RUSAGE_METRICS.get(metric):
120+
value = getattr(rusage, info.name) * info.factor
121+
results.append(Result(metric, value, info.unit))
122+
123+
return results
124+
125+
126+
if __name__ == "__main__":
127+
parser = argparse.ArgumentParser(
128+
description=f"Measure resource usage of a command using perf and rusage. The results are appended to {OUTFILE.name}.",
129+
)
130+
parser.add_argument(
131+
"-t",
132+
"--topic",
133+
action="append",
134+
default=[],
135+
help="topic prefix for the metrics",
136+
)
137+
parser.add_argument(
138+
"-m",
139+
"--metric",
140+
action="append",
141+
default=[],
142+
help=f"metrics to measure. Can be specified multiple times. Available metrics: {', '.join(sorted(ALL_METRICS))}",
143+
)
144+
parser.add_argument(
145+
"cmd",
146+
nargs="*",
147+
help="command to measure the resource usage of",
148+
)
149+
args = parser.parse_args()
150+
151+
topics: list[str] = args.topic
152+
metrics: list[str] = args.metric
153+
cmd: list[str] = args.cmd
154+
155+
results = measure(cmd, metrics)
156+
157+
with open(OUTFILE, "a+") as f:
158+
for result in results:
159+
for topic in topics:
160+
f.write(f"{result.fmt(topic)}\n")

scripts/bench/own-docs/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# The `own-docs` benchmark
2+
3+
This benchmark measures the time taken for `doc-gen4` to generate its
4+
own documentation. This is a convenient stand-in for moderately-sized
5+
Lean projects without too many dependencies. When this benchmark was
6+
created, the time was dominated by building documentation for core
7+
Lean, but this may change over time and it's good to track it.

scripts/bench/own-docs/run

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/usr/bin/env bash
2+
set -euxo pipefail
3+
4+
BENCH="scripts/bench"
5+
6+
# Prepare build
7+
8+
lake clean
9+
10+
# We want to measure the time taken to generate documentation, not to
11+
# build the tool, so we build it and the library first:
12+
lake build DocGen4
13+
lake build doc-gen4
14+
15+
env DOCGEN_SRC="file" "$BENCH/measure.py" -t own-docs -m instructions -m maxrss -m task-clock -m wall-clock -- \
16+
lake build DocGen4:docs

0 commit comments

Comments
 (0)