Skip to content

Commit dddab8a

Browse files
committed
perf: record hash in benchmark log and ability to diff against it
Just say I try three things in sequence with these hashes: - deadbeef - feedface - abcd0123 Normally running the benchmark compares against the latest run only. But say I'm working on "abcd0123" and want to know the delta against "deadbeef", now I can use `BASE=deadbeef bin/benchmark/matcher.lua`. If there is no such hash in the log, reports an error. If there is a hash, but it's "dirty", prints a warning that it's using a dirty hash (unless you explicitly asked for the dirty hash). Note: examples above are 8-character hashes. In reality, this system always stores 40-character hashes. If your `BASE` value is shorter, the code does a prefix match. So, let's show this in action. First of all, I went back looking at changes to `lua/wincent/commandt/private/benchmark.lua` to see how far back I could easily go without running into significant changes to the benchmarking script. The last meaningful change was in 70f5c20 ("refactor(lua): rationalize our external symbols", 2022-07-29), meaning we can go back around three years without having to manually edit the file. With a bit more work, could do the edits anyway, but we can't go that much further back without running into a major change which would be painful to deal with: e91f67c ("refactor(lua): extract wincent.commandt.private.benchmark", 2022-07-16). With this info in mind, go back and find perf-related commits in the time invterval: git oneline --grep perf --reverse --since 2022-07-29 Here are some of the possibly interesting ones to look at: a91c298 perf: speed up workers by processing chunks of consecutive haystacks (12 months ago) 7e6f158 perf: try compiling with `-Ofast` instead of `-O3` (12 months ago) 48b5ee5 perf: add more compiler flags (11 months ago) 6033ca8 perf: avoid redundant merge (5 months ago) 3cd40d7 perf: limit based on available height as opposed to configured height (6 weeks ago) 4b98334 perf: use voodoo coding to eek out minor performance increase (6 weeks ago) 906385e perf: use faster downcasing (6 weeks ago) 19eae40 perf: avoid some pointer indirection (6 weeks ago) 817208d perf: avoid more pointer indirection (6 weeks ago) 9715dbc refactor: do case conversions consistently (6 weeks ago) bc8fe12 perf: avoid repeated case conversions (5 weeks ago) 8338f98 perf: avoid repeated pointer traversals (5 weeks ago) 8dfda23 perf: avoid some more pointer traversals (5 weeks ago) 56954c5 perf: record hash in benchmark log and ability to diff against it (HEAD -> main) (42 minutes ago) plus of course the oldest one that is accessible to us before the first "breaking" change to the benchmark, which we'll use as a baseline: 75cc367 feat(lua): teach the help finder to open in vertical splits, tabs etc (3 years ago) There might be other intermediate commits that impact perf, but if so, they don't contain the word "perf", so they don't show up in the list above and we'll be skipping them for the purposes of this exercise. Anyway, we go back and run the benchmarks at these commits, starting with the oldest one and working forward. Note that I had to pass `-C` to `make` at the beginning because initially there was no suitable top-level `Makefile` for us to use. (git co 75cc367 && make -C lua/wincent/commandt/lib clean && make -C lua/wincent/commandt/lib && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .) (git co a91c298 && make -C lua/wincent/commandt/lib clean && make -C lua/wincent/commandt/lib && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .) (git co 4b98334 && make clean && make && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .) (git co 8dfda23 && make clean && make && git co main -- lua/wincent/commandt/private/benchmark.lua && TIMES=5 bin/benchmarks/matcher.lua && git reset HEAD && git co .) With the historical benchmark data now seeded, we can try some different `BASE` values and observe how the performance gets bigger and bigger as we go farther back in time. First up, start with the current `HEAD` and no `BASE`. That is, we should see current performance, and no meaningful delta (ie. no `p` value). Indeed, we see exactly that when we run it twice in a row: TIMES=5 bin/benchmarks/matcher.lua # ie. no BASE Summary of cpu time and (wall time): best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.20211 0.21416 0.03242 [-0.1%] (0.20212) (0.21416) (0.03242) [-0.1%] command-t 0.14975 0.15900 0.03848 [+0.4%] (0.14975) (0.15900) (0.03849) [+0.4%] chromium (subset) 1.17113 1.17970 0.01456 [-0.4%] (0.26319) (0.26443) (0.00197) [-1.0%] chromium (whole) 0.90269 0.90448 0.00444 [-2.3%] (0.10272) (0.10495) (0.00279) [-8.9%] big (400k) 1.35537 1.35903 0.00908 [-0.4%] (0.14771) (0.15109) (0.00506) [+0.4%] total 3.78496 3.81636 0.05603 [-0.8%] (0.86743) (0.89363) (0.06559) [-1.2%] Now we compare against 2025-07-03 and again see no significant difference because there has been no perf-related work since then: BASE=8dfda23 TIMES=5 bin/benchmarks/matcher.lua Summary of cpu time and (wall time): best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.20217 0.21286 0.02449 [-0.4%] (0.20217) (0.21286) (0.02448) [-0.4%] command-t 0.14968 0.15818 0.03143 [-1.3%] (0.14968) (0.15818) (0.03144) [-1.3%] chromium (subset) 1.18562 1.19020 0.01601 [+0.2%] (0.26808) (0.27069) (0.00804) [+1.6%] chromium (whole) 0.89829 0.90328 0.01043 [+0.3%] (0.10158) (0.10262) (0.00231) [-0.3%] big (400k) 1.35498 1.36317 0.01556 [+0.1%] (0.14805) (0.14954) (0.00344) [-1.6%] total 3.80532 3.82769 0.04708 [+0.1%] (0.87382) (0.89390) (0.04786) [-0.1%] Now we go back to 2025-06-24 and see our first significant perf change: BASE=4b98334 TIMES=5 bin/benchmarks/matcher.lua Summary of cpu time and (wall time): best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.20301 0.21354 0.02899 [-6.2%] 0.05 (0.20301) (0.21354) (0.02898) [-6.2%] 0.05 command-t 0.15010 0.15802 0.03212 [-10.7%] 0.05 (0.15010) (0.15802) (0.03212) [-10.7%] 0.05 chromium (subset) 1.17620 1.18015 0.01067 [-11.0%] 0.05 (0.25964) (0.26566) (0.00807) [-6.2%] 0.05 chromium (whole) 0.90050 0.90376 0.00371 [-17.7%] 0.05 (0.10576) (0.10661) (0.00270) [-13.3%] 0.05 big (400k) 1.35552 1.35942 0.00706 [-17.6%] 0.05 (0.14950) (0.15033) (0.00155) [-16.5%] 0.05 total 3.79207 3.81489 0.06748 [-14.7%] 0.05 (0.86998) (0.89416) (0.06213) [-9.6%] 0.05 Back to 2024-08-13 we wee a bigger change: BASE=a91c298 TIMES=5 bin/benchmarks/matcher.lua Summary of cpu time and (wall time): best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.20270 0.21348 0.02811 [-9.7%] 0.05 (0.20270) (0.21348) (0.02810) [-9.7%] 0.05 command-t 0.15069 0.15926 0.03167 [-8.7%] (0.15069) (0.15931) (0.03165) [-8.6%] chromium (subset) 1.13067 1.16924 0.04352 [-17.2%] 0.05 (0.25553) (0.26593) (0.01224) [-9.5%] 0.05 chromium (whole) 0.89872 0.90251 0.00634 [-29.0%] 0.05 (0.10030) (0.10405) (0.00689) [-25.0%] 0.05 big (400k) 1.35960 1.36199 0.00417 [-27.8%] 0.05 (0.15053) (0.15393) (0.00614) [-23.3%] 0.05 total 3.75276 3.80648 0.08109 [-23.0%] 0.05 (0.87232) (0.89669) (0.06312) [-13.6%] 0.05 Finally, we go all the way back to 2022-07-29 and see an even bigger change, as expected: BASE=75cc367 TIMES=5 bin/benchmarks/matcher.lua Summary of cpu time and (wall time): best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.20291 0.22017 0.05556 [-2.7%] (0.20291) (0.22017) (0.05555) [-2.7%] command-t 0.14949 0.15046 0.00143 [-20.9%] 0.05 (0.14949) (0.15046) (0.00143) [-20.9%] 0.05 chromium (subset) 1.17581 1.17860 0.00856 [-32.5%] 0.05 (0.26507) (0.26827) (0.00483) [-19.7%] 0.05 chromium (whole) 0.90010 0.90297 0.00423 [-61.9%] 0.05 (0.10188) (0.10636) (0.00576) [-52.5%] 0.05 big (400k) 1.35858 1.36409 0.01234 [-65.3%] 0.05 (0.14769) (0.15255) (0.00860) [-59.0%] 0.05 total 3.79522 3.81629 0.05523 [-49.0%] 0.05 (0.87816) (0.89781) (0.05850) [-26.3%] 0.05 Note that when reading these tables it's important to focus on the % changes, not the absolute values, because all of the latter are just repeatedly showing the current performance values. In summary, the time deltas that we see are: - Baseline (2025-08-08): n/a - From 2025-08-08 to 2022-07-29: -49.0% (CPU time) and -26.3% (wall time) - From 2025-08-08 to 2024-08-13: -23.0% (CPU time) and -13.6% (wall time) - From 2025-08-08 to 2025-06-24: -14.7% (CPU time) and -9.6% (wall time) - From 2025-08-08 to 2025-07-03: -0.8% (CPU time) and -1.2% (wall time) (but as noted above, that last row is noise only, with no `p` value significance.)
1 parent 0a40e75 commit dddab8a

File tree

1 file changed

+105
-2
lines changed

1 file changed

+105
-2
lines changed

lua/wincent/commandt/private/benchmark.lua

Lines changed: 105 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,97 @@ local lib = require('wincent.commandt.private.lib')
1111

1212
lib.epoch() -- Force eager loading of C library.
1313

14+
local function get_git_hash()
15+
local handle = io.popen('command git rev-parse HEAD 2>/dev/null')
16+
if not handle then
17+
return nil
18+
end
19+
local hash = handle:read('*line')
20+
handle:close()
21+
22+
if not hash or hash == '' then
23+
return nil
24+
end
25+
26+
-- Check if work tree is dirty
27+
handle = io.popen('command git status --porcelain 2>/dev/null')
28+
if not handle then
29+
return hash
30+
end
31+
32+
local status = handle:read('*all')
33+
handle:close()
34+
35+
if status and status ~= '' then
36+
return hash .. '-dirty'
37+
else
38+
return hash
39+
end
40+
end
41+
42+
local function is_dirty(hash)
43+
return string.sub(hash, -string.len('-dirty')) == '-dirty'
44+
end
45+
46+
local function find_baseline(log, base_input)
47+
if not base_input then
48+
return nil
49+
end
50+
51+
-- Parse BASE (eg. "deadbeef" or "deadbeef-dirty").
52+
local base_hash = base_input
53+
local dirty_base = false
54+
if base_input:sub(-6) == '-dirty' then
55+
base_hash = base_input:sub(1, -7)
56+
dirty_base = true
57+
end
58+
59+
-- Scan backwards through log entries.
60+
for i = #log, 1, -1 do
61+
local entry = log[i]
62+
if entry.hash then
63+
-- Look for exact match.
64+
if entry.hash == base_input then
65+
return entry
66+
end
67+
68+
-- Look for prefix match.
69+
local dirty_entry = is_dirty(entry.hash)
70+
local entry_hash = dirty_entry and entry.hash:sub(1, -7) or entry.hash
71+
72+
if entry_hash:sub(1, #base_hash) == base_hash then
73+
if dirty_base and dirty_entry then
74+
-- User wants dirty match and we found one.
75+
return entry
76+
elseif not dirty_base and not dirty_entry then
77+
-- User wants clean match and we found one.
78+
return entry
79+
end
80+
end
81+
end
82+
end
83+
84+
-- Scan again looking for fallback dirty match.
85+
if not dirty_base then
86+
for i = #log, 1, -1 do
87+
local entry = log[i]
88+
if entry.hash and is_dirty(entry.hash) then
89+
local entry_hash = entry.hash:sub(1, -7)
90+
if entry_hash:sub(1, #base_hash) == base_hash then
91+
print('Warning: Using dirty version of hash ' .. base_hash)
92+
return entry
93+
end
94+
end
95+
end
96+
end
97+
98+
return nil
99+
end
100+
101+
local function red(text)
102+
return '\027[31m' .. text .. '\027[0m'
103+
end
104+
14105
local reduce = function(list, initial, cb)
15106
local acc = initial
16107
for i, value in ipairs(list) do
@@ -240,8 +331,22 @@ local function benchmark(options)
240331
local ok, log = pcall(require, options.log)
241332
log = ok and log or {}
242333

334+
local base_hash = os.getenv('BASE')
335+
local previous
336+
337+
if base_hash then
338+
previous = find_baseline(log, base_hash)
339+
if not previous then
340+
print(red('Error: Could not find baseline hash "' .. base_hash .. '" in benchmark logs'))
341+
os.exit(1)
342+
end
343+
else
344+
previous = log[#log]
345+
end
346+
243347
local results = {
244348
when = os.date(),
349+
hash = get_git_hash(),
245350
timings = {},
246351
}
247352

@@ -306,8 +411,6 @@ local function benchmark(options)
306411
end
307412
end
308413

309-
local previous = log[#log]
310-
311414
for label, metrics in pairs(results.timings) do
312415
metrics['cpu (best)'] = math.min(unpack(metrics.cpu))
313416
metrics['wall (best)'] = math.min(unpack(metrics.wall))

0 commit comments

Comments
 (0)