Regression testing for Flux #6998

wihobbs · 2025-08-27T19:02:03Z

wihobbs
Aug 27, 2025
Maintainer

I couldn't find an open issue on this, and it's more of an open ended call for ideas.

@trws said we should measure regression in the testsuite. Articles I've read disagree on whether this should be done on a nightly cadence or a PR basis. A few things I've found:

hyperfine is a Rust-based tool that measures timing of commands and comes with some extra features that could be useful for a GH action, like exporting it all to a Markdown table easily.

  auk108 ~ $ hyperfine --runs 30 'flux start exit'
Benchmark 1: flux start exit
  Time (mean ± σ):     967.5 ms ±  36.0 ms    [User: 436.2 ms, System: 420.5 ms]
  Range (min … max):   875.4 ms … 1022.4 ms    30 runs

  auk108 ~ $ hyperfine --runs 30 'flux start flux submit hostname'
Benchmark 1: flux start flux submit hostname
  Time (mean ± σ):      1.142 s ±  0.039 s    [User: 0.588 s, System: 0.459 s]
  Range (min … max):    1.078 s …  1.232 s    30 runs

  auk108 ~ $ flux start hyperfine "flux ping --count=1000 --interval=0 broker"
Benchmark 1: flux ping --count=1000 --interval=0 broker
  Time (mean ± σ):      14.0 ms ±   2.5 ms    [User: 5.5 ms, System: 8.2 ms]
  Range (min … max):    12.8 ms …  35.2 ms    78 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (35.2 ms). 
  This could be caused by (filesystem) caches that were not filled until after the first run. You should '
  consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, 
  use the '--prepare' option to clear the caches before each timing run.

It doesn't show up in GitHub but the output is colored really nicely, which I like.

bencher is a tool that uses hyperfine under the hood to run a testsuite and stick a comment in the PR about how the PR has improved/regressed performance, much like CodeCov, but for performance timing.

A couple considerations for how this might work with Flux:

We'd probably want to test flux start, and then a series of tests within a Flux instance (submitting a job, pinging the broker, loading modules?)
I was originally thinking maybe we could wrap the sharness test_* functions with a timer, or provide a macro to time certain tests in the testsuite and just report on those, so we wouldn't need a lot of extra tooling. Bad idea?'
Perhaps this should go in test-collective so we can measure things like sched + core regression, but I like the idea of having a comment on PRs.

Anyway, I'm curious to hear what others think or if anyone has tried this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression testing for Flux #6998

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Regression testing for Flux #6998

Uh oh!

wihobbs Aug 27, 2025 Maintainer

Replies: 0 comments

wihobbs
Aug 27, 2025
Maintainer