Benchmarking Support #7850
Replies: 6 comments 6 replies
-
|
These are some great proposals! Sharing my initial thoughts below. General concernsWhen it comes to benchmarking in tests, it tends to get really flaky. Benchmark results highly depend on the CPU capabilities load, which may very both across different machines (think local and CI) as well as on the same machine under low/high load. It would be great if Vitest facilitated that, the same way it facilitates a lot of little quality-of-life features. One thing I found problematic is that including any fixed expectation values, like I am not advocating for dropping ExpectationsIt is fair to say that most benchmark tests are modeled as regression tests. You don't want your software to get slower than itself or slower than some other software. I can hardly think of an expectation for the benchmark to be faster than something, given that's rather a wishful thinking (the performance improvement can also be measured by the increased delta). That being said, it does make sense to have both Literal valuesI don't believe asserting against literal values makes sense. I share in your concern about running multiple benchmarks in the same process as well. I feel like snapshots are justified in benchmark testing. I can't say I like the proposed snapshot solution though. What is the value of comparing current performance of A to the past performance of A (snapshot) and expect it to get faster/same? Maybe I'm missing a use case, that's all.
This is also not a sign of a good API: allowEmpty: true // set this on the first run, remove afterwardsI understand the intention here not to fail the test if this is the first benchmark run, but at that point, Vitest can take initiative and report the test as (1) ran; (2) benchmark saved into a snapshot; (3) benchmark assertions skipped (nothing to compare to). PracticalityIt all comes down to how benchmark tests are going to be used in the wild. I believe you get most value out of them if they are, indeed, a separate test suite you run and monitor the reports. This is where we are threading in a similar forest as Code coverage. I am unlikely to want my bench test to fail if something got slower. Instead, I want to know when something got smaller, and then take appropriate action. Maybe I intend it to get slower. Maybe it getting slower is an inevitable side-effect of the changes I'm introducing. This is where I see most value from Vitest:
This stems from a simple truth that performance cannot be stated as expectation. You can expect your software to do what you intend it to do, but stating "X should be faster than Y" is equivalent to stating "X should be good™". All you care about is "how fast is X?" and "did X got slower?". These questions are expectation-less and are answered in reports, not test runs. I know you are trying to address different use cases and concerns, and also hope that mine will be of some help. Benchmarking as you described is useful in library development, sure, but we have to think a about more use cases than that. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for starting a discussion on this here! We began to use vitest's benchmarking functionality in some of our suites, but we had to step down to using tinybench directly due to the lack of support for vitests's hooks not being run at all when running in benchmark mode (open issue here: #5075). Roughly our vitest benchmarks look something like this, which have a few ergonomic issues: import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { Bench } from "tinybench";
describe("benchmark", () => {
beforeEach(async () => {
// Setup before each test
});
afterEach(async () => {
// Cleanup after each test
});
it("should run a benchmark for doing something", async () => {
const bench = new Bench({});
bench.add("code being benchmarked", async () => {
// do some stuff, maybe assert something
const res = await fetch("https://example.com/api/reference");
expect(res.status).toBe(200);
});
await bench.run();
const task = bench.getTask("code being benchmarked")!;
const result = task.result!;
expect(result.hz).toBeGreaterThan(70);
expect(result.p99).toBeLessThan(75);
// Only output the table on development
if (!process.env.CI) {
console.table({
name: task.name,
"ops/sec": parseInt(result.hz.toString(), 10).toLocaleString(),
min: result.min,
max: result.max,
avg: result.mean,
p99: result.p99,
"p99.9": result.p999,
iterations: result.samples.length,
});
}
});
});This has the same issues that have already been spoken about above, namely:
Would love to get back to using the built in Anyway I don't have much more to say, just wanted to send a note about our issues with this today and know I'll be following this closely! Thanks for all of the great work you do. |
Beta Was this translation helpful? Give feedback.
-
|
Seems very cool! Is there any way to make a bench test fail currently? Tried to play with the bench(
'Benchmark',
() => {
// ...
},
{
teardown: (task) => {
for (const result of task.bench.results) {
expect(result?.mean).toBeLessThan(0); // never fails
}
},
},
); |
Beta Was this translation helpful? Give feedback.
-
|
(Expanded from #8703) From my experience, right now, benchmarking in Vitest feels more like an additional module that's been tacked on, but not really integrated into Vitest. While it's been a great help in trying and comparing different algorithmic implementations, when trying to use it as a general "app performance" monitor (e.g. did a dependency update make things slower, did a code change make things load longer, etc.) falls a bit flat, sadly. Since they're not actual tests, there's only the native I am also somehow running into clean up issues with promises not being flushed after each run, out of memory errors, etc. When using regular tests these cases seem to be handled better? Doing any larger benchmarks has been very risky so far, so we've mostly been doing micro-benchmarking of certain features/converters/etc. |
Beta Was this translation helpful? Give feedback.
-
|
One thing I'd love to see in this redesign is a clear story for running benchmarks in Browser Mode as well. Vitest 4 ships Browser Mode as stable, and it's already used to catch differences between simulated environments (jsdom/happy-dom) and real browser engines (Chromium, WebKit, etc.). However, the proposal here seems focused purely on the Node + Vite SSR side of things, and doesn't really mention how bench is expected to behave when tests are running in a real browser. In practice you can already combine vitest bench with For some use cases, this is critical. For example, Concretely, it would be great if:
That would make the new benchmark API a great fit not only for library authors doing micro-benchmarks in Node, but also for real-world web apps that need to track performance across multiple browsers. Is Browser Mode support for bench something you're considering as part of this redesign, or would you prefer it to be tackled as a follow-up? |
Beta Was this translation helpful? Give feedback.
-
|
hello just in case it is not too late, a convenient feature for bench to be able to run on it() tests: TLDR: So we have already 150 vitests we want to use as benchmark too. While not optimal (we are including the expect... logic in the benchmark) it would be really convinient to be able to vitest bench the same files as vitest run without modifications. and it could even ignore the expect lines by default thanks for considering this! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Vitest provides an experimental support for benchmarking for quite a while now. So far, the feature is still kept as experimental because we identified several issues that we think should be resolved before the benchmarking feature is considered stable.
benchfunction disablestest. This makes it impossible to run both in a single Vitest process.To fix these issues, we propose several breaking changes to the API:
benchis not atestThe
benchfunction will return the benchmark result instead of collecting all benchmarks and executing them later. This means that it needs to be called inside thetestfunction. If it's called outside, it will throw an error.benchfunction cannot be called inconcurrenttests. Concurrent benchmark tests would be very flaky, they should be executed in isolation from each other.The
benchresult is automatically attached to the test case:Benchmarks are reported in new
onTestCaseBenchmarkStartandonTestCaseBenchmarkResultevents. They receive a test case and a benchmark result:This will also remove the first parameter from
startVitestAPI:import { startVitest } from 'vitest/node' - await startVitest('test') + await startVitest()To help with the Vite SSR issue, Vitest will print a warning if the export is accessed a lot of times inside the benchmark. It is recommended to save the imported value into a separate variable in Vitest. The warning can be disabled.
The recommended setup for benchmark tests would look like this:
Benchmark will always run after tests.
comparing different results
Vitest will introduce new
expectmatchers to compare different benchmarks:It might not be very useful to compare benchmarks done in the same process, so Vitest will also expose
bench.withSnapshotmethod that returns the current result and the previous one (similar to how--compareflag works right now). If there is no previous result, it will create one.Beta Was this translation helpful? Give feedback.
All reactions