Commit dddab8a
committed
perf: record hash in benchmark log and ability to diff against it
Just say I try three things in sequence with these hashes:
- deadbeef
- feedface
- abcd0123
Normally running the benchmark compares against the latest run only. But
say I'm working on "abcd0123" and want to know the delta against
"deadbeef", now I can use `BASE=deadbeef bin/benchmark/matcher.lua`.
If there is no such hash in the log, reports an error.
If there is a hash, but it's "dirty", prints a warning that it's using a
dirty hash (unless you explicitly asked for the dirty hash).
Note: examples above are 8-character hashes. In reality, this system
always stores 40-character hashes. If your `BASE` value is shorter, the
code does a prefix match.
So, let's show this in action. First of all, I went back looking at
changes to `lua/wincent/commandt/private/benchmark.lua` to see how far
back I could easily go without running into significant changes to the
benchmarking script. The last meaningful change was in
70f5c20 ("refactor(lua): rationalize our external symbols",
2022-07-29), meaning we can go back around three years without having
to manually edit the file. With a bit more work, could do the edits
anyway, but we can't go that much further back without running into a
major change which would be painful to deal with: e91f67c
("refactor(lua): extract wincent.commandt.private.benchmark",
2022-07-16).
With this info in mind, go back and find perf-related commits in the
time invterval:
git oneline --grep perf --reverse --since 2022-07-29
Here are some of the possibly interesting ones to look at:
a91c298 perf: speed up workers by processing chunks of consecutive haystacks (12 months ago)
7e6f158 perf: try compiling with `-Ofast` instead of `-O3` (12 months ago)
48b5ee5 perf: add more compiler flags (11 months ago)
6033ca8 perf: avoid redundant merge (5 months ago)
3cd40d7 perf: limit based on available height as opposed to configured height (6 weeks ago)
4b98334 perf: use voodoo coding to eek out minor performance increase (6 weeks ago)
906385e perf: use faster downcasing (6 weeks ago)
19eae40 perf: avoid some pointer indirection (6 weeks ago)
817208d perf: avoid more pointer indirection (6 weeks ago)
9715dbc refactor: do case conversions consistently (6 weeks ago)
bc8fe12 perf: avoid repeated case conversions (5 weeks ago)
8338f98 perf: avoid repeated pointer traversals (5 weeks ago)
8dfda23 perf: avoid some more pointer traversals (5 weeks ago)
56954c5 perf: record hash in benchmark log and ability to diff against it (HEAD -> main) (42 minutes ago)
plus of course the oldest one that is accessible to us before the first
"breaking" change to the benchmark, which we'll use as a baseline:
75cc367 feat(lua): teach the help finder to open in vertical splits, tabs etc (3 years ago)
There might be other intermediate commits that impact perf, but if so,
they don't contain the word "perf", so they don't show up in the list
above and we'll be skipping them for the purposes of this exercise.
Anyway, we go back and run the benchmarks at these commits, starting
with the oldest one and working forward. Note that I had to pass `-C`
to `make` at the beginning because initially there was no suitable
top-level `Makefile` for us to use.
(git co 75cc367 && make -C lua/wincent/commandt/lib clean && make -C lua/wincent/commandt/lib && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .)
(git co a91c298 && make -C lua/wincent/commandt/lib clean && make -C lua/wincent/commandt/lib && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .)
(git co 4b98334 && make clean && make && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .)
(git co 8dfda23 && make clean && make && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .)
With the historical benchmark data now seeded, we can try some different
`BASE` values and observe how the performance gets bigger and bigger as
we go farther back in time.
First up, start with the current `HEAD` and no `BASE`. That is, we
should see current performance, and no meaningful delta (ie. no `p`
value). Indeed, we see exactly that when we run it twice in a row:
TIMES=5 bin/benchmarks/matcher.lua # ie. no BASE
Summary of cpu time and (wall time):
best avg sd +/- p (best) (avg) (sd) +/- p
pathological 0.20211 0.21416 0.03242 [-0.1%] (0.20212) (0.21416) (0.03242) [-0.1%]
command-t 0.14975 0.15900 0.03848 [+0.4%] (0.14975) (0.15900) (0.03849) [+0.4%]
chromium (subset) 1.17113 1.17970 0.01456 [-0.4%] (0.26319) (0.26443) (0.00197) [-1.0%]
chromium (whole) 0.90269 0.90448 0.00444 [-2.3%] (0.10272) (0.10495) (0.00279) [-8.9%]
big (400k) 1.35537 1.35903 0.00908 [-0.4%] (0.14771) (0.15109) (0.00506) [+0.4%]
total 3.78496 3.81636 0.05603 [-0.8%] (0.86743) (0.89363) (0.06559) [-1.2%]
Now we compare against 2025-07-03 and again see no significant difference
because there has been no perf-related work since then:
BASE=8dfda23 TIMES=5 bin/benchmarks/matcher.lua
Summary of cpu time and (wall time):
best avg sd +/- p (best) (avg) (sd) +/- p
pathological 0.20217 0.21286 0.02449 [-0.4%] (0.20217) (0.21286) (0.02448) [-0.4%]
command-t 0.14968 0.15818 0.03143 [-1.3%] (0.14968) (0.15818) (0.03144) [-1.3%]
chromium (subset) 1.18562 1.19020 0.01601 [+0.2%] (0.26808) (0.27069) (0.00804) [+1.6%]
chromium (whole) 0.89829 0.90328 0.01043 [+0.3%] (0.10158) (0.10262) (0.00231) [-0.3%]
big (400k) 1.35498 1.36317 0.01556 [+0.1%] (0.14805) (0.14954) (0.00344) [-1.6%]
total 3.80532 3.82769 0.04708 [+0.1%] (0.87382) (0.89390) (0.04786) [-0.1%]
Now we go back to 2025-06-24 and see our first significant perf change:
BASE=4b98334 TIMES=5 bin/benchmarks/matcher.lua
Summary of cpu time and (wall time):
best avg sd +/- p (best) (avg) (sd) +/- p
pathological 0.20301 0.21354 0.02899 [-6.2%] 0.05 (0.20301) (0.21354) (0.02898) [-6.2%] 0.05
command-t 0.15010 0.15802 0.03212 [-10.7%] 0.05 (0.15010) (0.15802) (0.03212) [-10.7%] 0.05
chromium (subset) 1.17620 1.18015 0.01067 [-11.0%] 0.05 (0.25964) (0.26566) (0.00807) [-6.2%] 0.05
chromium (whole) 0.90050 0.90376 0.00371 [-17.7%] 0.05 (0.10576) (0.10661) (0.00270) [-13.3%] 0.05
big (400k) 1.35552 1.35942 0.00706 [-17.6%] 0.05 (0.14950) (0.15033) (0.00155) [-16.5%] 0.05
total 3.79207 3.81489 0.06748 [-14.7%] 0.05 (0.86998) (0.89416) (0.06213) [-9.6%] 0.05
Back to 2024-08-13 we wee a bigger change:
BASE=a91c298 TIMES=5 bin/benchmarks/matcher.lua
Summary of cpu time and (wall time):
best avg sd +/- p (best) (avg) (sd) +/- p
pathological 0.20270 0.21348 0.02811 [-9.7%] 0.05 (0.20270) (0.21348) (0.02810) [-9.7%] 0.05
command-t 0.15069 0.15926 0.03167 [-8.7%] (0.15069) (0.15931) (0.03165) [-8.6%]
chromium (subset) 1.13067 1.16924 0.04352 [-17.2%] 0.05 (0.25553) (0.26593) (0.01224) [-9.5%] 0.05
chromium (whole) 0.89872 0.90251 0.00634 [-29.0%] 0.05 (0.10030) (0.10405) (0.00689) [-25.0%] 0.05
big (400k) 1.35960 1.36199 0.00417 [-27.8%] 0.05 (0.15053) (0.15393) (0.00614) [-23.3%] 0.05
total 3.75276 3.80648 0.08109 [-23.0%] 0.05 (0.87232) (0.89669) (0.06312) [-13.6%] 0.05
Finally, we go all the way back to 2022-07-29 and see an even bigger
change, as expected:
BASE=75cc367 TIMES=5 bin/benchmarks/matcher.lua
Summary of cpu time and (wall time):
best avg sd +/- p (best) (avg) (sd) +/- p
pathological 0.20291 0.22017 0.05556 [-2.7%] (0.20291) (0.22017) (0.05555) [-2.7%]
command-t 0.14949 0.15046 0.00143 [-20.9%] 0.05 (0.14949) (0.15046) (0.00143) [-20.9%] 0.05
chromium (subset) 1.17581 1.17860 0.00856 [-32.5%] 0.05 (0.26507) (0.26827) (0.00483) [-19.7%] 0.05
chromium (whole) 0.90010 0.90297 0.00423 [-61.9%] 0.05 (0.10188) (0.10636) (0.00576) [-52.5%] 0.05
big (400k) 1.35858 1.36409 0.01234 [-65.3%] 0.05 (0.14769) (0.15255) (0.00860) [-59.0%] 0.05
total 3.79522 3.81629 0.05523 [-49.0%] 0.05 (0.87816) (0.89781) (0.05850) [-26.3%] 0.05
Note that when reading these tables it's important to focus on the %
changes, not the absolute values, because all of the latter are just
repeatedly showing the current performance values. In summary, the time
deltas that we see are:
- Baseline (2025-08-08): n/a
- From 2025-08-08 to 2022-07-29: -49.0% (CPU time) and -26.3% (wall time)
- From 2025-08-08 to 2024-08-13: -23.0% (CPU time) and -13.6% (wall time)
- From 2025-08-08 to 2025-06-24: -14.7% (CPU time) and -9.6% (wall time)
- From 2025-08-08 to 2025-07-03: -0.8% (CPU time) and -1.2% (wall time)
(but as noted above, that last row is noise only, with no `p` value
significance.)1 parent 0a40e75 commit dddab8a
1 file changed
+105
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
14 | 105 | | |
15 | 106 | | |
16 | 107 | | |
| |||
240 | 331 | | |
241 | 332 | | |
242 | 333 | | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
243 | 347 | | |
244 | 348 | | |
| 349 | + | |
245 | 350 | | |
246 | 351 | | |
247 | 352 | | |
| |||
306 | 411 | | |
307 | 412 | | |
308 | 413 | | |
309 | | - | |
310 | | - | |
311 | 414 | | |
312 | 415 | | |
313 | 416 | | |
| |||
0 commit comments