You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I couldn't find an open issue on this, and it's more of an open ended call for ideas.
@trws said we should measure regression in the testsuite. Articles I've read disagree on whether this should be done on a nightly cadence or a PR basis. A few things I've found:
hyperfine is a Rust-based tool that measures timing of commands and comes with some extra features that could be useful for a GH action, like exporting it all to a Markdown table easily.
auk108 ~ $ hyperfine --runs 30 'flux start exit'
Benchmark 1: flux start exit
Time (mean ± σ): 967.5 ms ± 36.0 ms [User: 436.2 ms, System: 420.5 ms]
Range (min … max): 875.4 ms … 1022.4 ms 30 runs
auk108 ~ $ hyperfine --runs 30 'flux start flux submit hostname'
Benchmark 1: flux start flux submit hostname
Time (mean ± σ): 1.142 s ± 0.039 s [User: 0.588 s, System: 0.459 s]
Range (min … max): 1.078 s … 1.232 s 30 runs
auk108 ~ $ flux start hyperfine "flux ping --count=1000 --interval=0 broker"
Benchmark 1: flux ping --count=1000 --interval=0 broker
Time (mean ± σ): 14.0 ms ± 2.5 ms [User: 5.5 ms, System: 8.2 ms]
Range (min … max): 12.8 ms … 35.2 ms 78 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (35.2 ms).
This could be caused by (filesystem) caches that were not filled until after the first run. You should '
consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively,
use the '--prepare' option to clear the caches before each timing run.
It doesn't show up in GitHub but the output is colored really nicely, which I like.
bencher is a tool that uses hyperfine under the hood to run a testsuite and stick a comment in the PR about how the PR has improved/regressed performance, much like CodeCov, but for performance timing.
A couple considerations for how this might work with Flux:
We'd probably want to test flux start, and then a series of tests within a Flux instance (submitting a job, pinging the broker, loading modules?)
I was originally thinking maybe we could wrap the sharness test_* functions with a timer, or provide a macro to time certain tests in the testsuite and just report on those, so we wouldn't need a lot of extra tooling. Bad idea?'
Perhaps this should go in test-collective so we can measure things like sched + core regression, but I like the idea of having a comment on PRs.
Anyway, I'm curious to hear what others think or if anyone has tried this.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I couldn't find an open issue on this, and it's more of an open ended call for ideas.
@trws said we should measure regression in the testsuite. Articles I've read disagree on whether this should be done on a nightly cadence or a PR basis. A few things I've found:
It doesn't show up in GitHub but the output is colored really nicely, which I like.
A couple considerations for how this might work with Flux:
flux start
, and then a series of tests within a Flux instance (submitting a job, pinging the broker, loading modules?)test_*
functions with a timer, or provide a macro to time certain tests in the testsuite and just report on those, so we wouldn't need a lot of extra tooling. Bad idea?'Anyway, I'm curious to hear what others think or if anyone has tried this.
Beta Was this translation helpful? Give feedback.
All reactions