Benchmarking Support #7850

sheremet-va · 2025-04-17T12:11:52Z

sheremet-va
Apr 17, 2025
Maintainer

Vitest provides an experimental support for benchmarking for quite a while now. So far, the feature is still kept as experimental because we identified several issues that we think should be resolved before the benchmarking feature is considered stable.

Benchmark results are not very useful. Currently benchmarks exists only to show a table a the end of the test run.
The bench function disables test. This makes it impossible to run both in a single Vitest process.
Benchmarking is slower than your actual code due to how Vite SSR works.

To fix these issues, we propose several breaking changes to the API:

`bench` is not a `test`

The bench function will return the benchmark result instead of collecting all benchmarks and executing them later. This means that it needs to be called inside the test function. If it's called outside, it will throw an error.

bench function cannot be called in concurrent tests. Concurrent benchmark tests would be very flaky, they should be executed in isolation from each other.

The bench result is automatically attached to the test case:

interface TestCaseBenchmark extends TinybenchResult {
  name: string
  rank: number
  sampleCount: number
  median: number
}

class TestCase {
  // ... all other methods
  benchmarks(): TestCaseBenchmark[]
}

Benchmarks are reported in new onTestCaseBenchmarkStart and onTestCaseBenchmarkResult events. They receive a test case and a benchmark result:

import type { Reporter } from 'vitest/node'

class BenchmarkReporter implements Reporter {
  onTestCaseBenchmarkStart(testCase: TestCase, benchmark: TestCaseBenchmark): void
  onTestCaseBenchmarkResult(testCase: TestCase, benchmark: TestCaseBenchmark): void
}

This will also remove the first parameter from startVitest API:

import { startVitest } from 'vitest/node'

- await startVitest('test')
+ await startVitest()

To help with the Vite SSR issue, Vitest will print a warning if the export is accessed a lot of times inside the benchmark. It is recommended to save the imported value into a separate variable in Vitest. The warning can be disabled.

The recommended setup for benchmark tests would look like this:

export default defineConfig({
  test: {
    workspace: [
      {
        test: {
          include: ['./*.test.ts']
        },
      },
      {
        test: {
          include: ['./*.bench.ts'],
          benchmark: {
            enabled: true,
          },
        },
      },
    ],
  },
})

Benchmark will always run after tests.

comparing different results

Vitest will introduce new expect matchers to compare different benchmarks:

test('...', async () => {
  const result1 = await bench('name 1', () => {})
  const result2 = await bench('name 2', () => {})

  expect.bench(result1).toBeFasterThan(result2, { delta })
  expect.bench(result2).toBeSlowerThan(result1, { delta })
})

It might not be very useful to compare benchmarks done in the same process, so Vitest will also expose bench.withSnapshot method that returns the current result and the previous one (similar to how --compare flag works right now). If there is no previous result, it will create one.

// read prevResult from path if it exists
// run the bench
// write the new result to the path afterwards
const [result, prevResult] = await bench.withSnapshot('path-to-snapshot.json', () => {
  // ...
})

expect(result).toBeFasterThan(prevResult, {
  delta,
  allowEmpty: true // set this on the first run, remove afterwards
})

kettanaito · 2025-04-18T11:47:10Z

kettanaito
Apr 18, 2025

These are some great proposals! Sharing my initial thoughts below.

General concerns

When it comes to benchmarking in tests, it tends to get really flaky. Benchmark results highly depend on the CPU capabilities load, which may very both across different machines (think local and CI) as well as on the same machine under low/high load.

It would be great if Vitest facilitated that, the same way it facilitates a lot of little quality-of-life features. One thing I found problematic is that including any fixed expectation values, like delta, is a likely recipe for flakiness. It may be a framework's initiative to step in here and provide thoughtful limitations and defaults to ensure consistent and reliable benchmarking.

I am not advocating for dropping delta but instead hope to stir a discussion around the reliability of benchmark testing.

Expectations

It is fair to say that most benchmark tests are modeled as regression tests. You don't want your software to get slower than itself or slower than some other software. I can hardly think of an expectation for the benchmark to be faster than something, given that's rather a wishful thinking (the performance improvement can also be measured by the increased delta).

That being said, it does make sense to have both .toBeFasterThan() and .toBeSlowerThan() to test against your own functionality. For example, I'm releasing a new API that I intend to be faster than the previous, existing API. That makes sense to me if we cast aside the flakiness issue I mentioned before.

Literal values

I don't believe asserting against literal values makes sense. I share in your concern about running multiple benchmarks in the same process as well. I feel like snapshots are justified in benchmark testing.

I can't say I like the proposed snapshot solution though. What is the value of comparing current performance of A to the past performance of A (snapshot) and expect it to get faster/same? Maybe I'm missing a use case, that's all.

Also not sure if expect.bench is needed if we've established that bench() can only be ran within test.benchmark.enabled = true. Does it do anything that the matcher couldn't already do/infer from the received benchmark value?

This is also not a sign of a good API:

 allowEmpty: true // set this on the first run, remove afterwards

I understand the intention here not to fail the test if this is the first benchmark run, but at that point, Vitest can take initiative and report the test as (1) ran; (2) benchmark saved into a snapshot; (3) benchmark assertions skipped (nothing to compare to).

Practicality

It all comes down to how benchmark tests are going to be used in the wild. I believe you get most value out of them if they are, indeed, a separate test suite you run and monitor the reports. This is where we are threading in a similar forest as Code coverage.

I am unlikely to want my bench test to fail if something got slower. Instead, I want to know when something got smaller, and then take appropriate action. Maybe I intend it to get slower. Maybe it getting slower is an inevitable side-effect of the changes I'm introducing.

This is where I see most value from Vitest:

Help me write benchmark tests easier.
Run the benchmark tests and save its report in a format I, or my tools can digest.
Then I use that reports to feed them into any tools of my choosing and print them, say, in a comment of my pull request.

This stems from a simple truth that performance cannot be stated as expectation. You can expect your software to do what you intend it to do, but stating "X should be faster than Y" is equivalent to stating "X should be good™". All you care about is "how fast is X?" and "did X got slower?". These questions are expectation-less and are answered in reports, not test runs.

I know you are trying to address different use cases and concerns, and also hope that mine will be of some help. Benchmarking as you described is useful in library development, sure, but we have to think a about more use cases than that.

4 replies

sheremet-va Apr 18, 2025
Maintainer Author

I understand the intention here not to fail the test if this is the first benchmark run, but at that point, Vitest can take initiative and report the test as (1) ran; (2) benchmark saved into a snapshot; (3) benchmark assertions skipped (nothing to compare to).

The main concern is that potential errors with misconfiguration can be swallowed if Vitest just always creates a report. Imagine that you wrote a test and forgot to store the artifact - Vitest will always just create a new one and you will never notice it.

This stems from a simple truth that performance cannot be stated as expectation. You can expect your software to do what you intend it to do, but stating "X should be faster than Y" is equivalent to stating "X should be good™". All you care about is "how fast is X?" and "did X got slower?". These questions are expectation-less and are answered in reports, not test runs.

I don't see how this contradicts the proposal. The proposal doesn't mention how toBeFasterThan/toBeSlowerThan are implemented, they are intentionally kept vague. Vitest is a test runner after all. If there is no mechanism to programmatically review benchmark results, there is no reason to couple them together.

I am unlikely to want my bench test to fail if something got slower. Instead, I want to know when something got smaller, and then take appropriate action.

When the test fails, you will know that something got slower. How else can a test runner make you aware of it?

kettanaito Apr 18, 2025

Imagine that you wrote a test and forgot to store the artifact

Can Vitest generate benchmark artifacts based on the test path/name? That way it's consistent and the only way for me to mess this up is to remove the test, at which point benchmarking becomes irrelevant.

If there is no mechanism to programmatically review benchmark results, there is no reason to couple them together.

That's sort of my point. Vitest may not be the best agent to review the results. It is certainly useful to run your code, reuse your Vite config, help you benchmark things with fewer distractions. But I believe it should output the report and be done with it. No assertions, no failures, just take a look at this stdout and report-123.json if you need to feed it to other tools.

When the test fails, you will know that something got slower. How else can a test runner make you aware of it?

I am trying to make an argument that it shouldn't make me aware of it. It should exit with 0 and print/save a report for me to analyze as I see fit. Something like a table of variants for the same benchmarked API (think of library comparisons). Otherwise, I may become aware of performance degradation when it's intended or irrelevant, but it will be blocking my workflow (think a failing CI job).

kettanaito Apr 18, 2025

I wonder what do you think about vitest bench as opposed to test.benchmark.enabled? We are, effectively, running Vitest in a different mode. With vitest bench, it may even automatically ignore regular test() that don't have bench() inside. Might be costly to filter them though.

sheremet-va Apr 18, 2025
Maintainer Author

Can Vitest generate benchmark artifacts based on the test path/name? That way it's consistent and the only way for me to mess this up is to remove the test, at which point benchmarking becomes irrelevant.

The issue is not with generation, but with persistence. If you run this in CI, your container will always be new unless you configured it correctly. Vitest cannot know that.

That's sort of my point. Vitest may not be the best agent to review the results.

But it is you who is writing the assertion. If you don't need to fail, don't use the function to compare the results.

I wonder what do you think about vitest bench as opposed to test.benchmark.enabled? We are, effectively, running Vitest in a different mode. With vitest bench, it may even automatically ignore regular test() that don't have bench() inside. Might be costly to filter them though.

We already have this command. I do not like it because it turns test runner (which is what Vitest is) into a benchmark runner with a different configuration, API and behaviour. If people just need to run benchmarks, they can always use tinybench directly. This proposal tries to combine features of both and justify it being part of Vitest toolkit.

The whole goal of this proposal is for this statement to not be true:

We are, effectively, running Vitest in a different mode

domharrington · 2025-04-21T14:14:26Z

domharrington
Apr 21, 2025

Thanks for starting a discussion on this here! We began to use vitest's benchmarking functionality in some of our suites, but we had to step down to using tinybench directly due to the lack of support for vitests's hooks not being run at all when running in benchmark mode (open issue here: #5075).

Roughly our vitest benchmarks look something like this, which have a few ergonomic issues:

import { afterEach, beforeEach, describe, expect, it } from "vitest";

import { Bench } from "tinybench";

describe("benchmark", () => {
  beforeEach(async () => {
    // Setup before each test
  });

  afterEach(async () => {
    // Cleanup after each test
  });

  it("should run a benchmark for doing something", async () => {
    const bench = new Bench({});

    bench.add("code being benchmarked", async () => {
      // do some stuff, maybe assert something
      const res = await fetch("https://example.com/api/reference");
      expect(res.status).toBe(200);
    });

    await bench.run();

    const task = bench.getTask("code being benchmarked")!;
    const result = task.result!;
    expect(result.hz).toBeGreaterThan(70);
    expect(result.p99).toBeLessThan(75);

    // Only output the table on development
    if (!process.env.CI) {
      console.table({
        name: task.name,
        "ops/sec": parseInt(result.hz.toString(), 10).toLocaleString(),
        min: result.min,
        max: result.max,
        avg: result.mean,
        p99: result.p99,
        "p99.9": result.p999,
        iterations: result.samples.length,
      });
    }
  });
});

This has the same issues that have already been spoken about above, namely:

hardcoded expected values, which change wildly between local and CI environments
lots of boilerplate and wiring for tinybench
manually console logging stuff when on development to see results

Would love to get back to using the built in bench() functionality provided by vitest. Also agree with the sentiment that the results should be snapshotted and persisted somewhere, but the problem then becomes how do you account for the differences between environments. I'm sure there could be a solution using github action's cache to place the benchmark output somewhere usable in CI: https://github.com/actions/cache.

Anyway I don't have much more to say, just wanted to send a note about our issues with this today and know I'll be following this closely! Thanks for all of the great work you do.

0 replies

nicooprat · 2025-07-09T09:44:45Z

nicooprat
Jul 9, 2025

Seems very cool! Is there any way to make a bench test fail currently? Tried to play with the teardown option, without luck:

  bench(
    'Benchmark',
    () => {
      // ...
    },
    {
      teardown: (task) => {
        for (const result of task.bench.results) {
          expect(result?.mean).toBeLessThan(0); // never fails
        }
      },
    },
  );

0 replies

spaceemotion · 2025-10-21T21:11:06Z

spaceemotion
Oct 21, 2025

(Expanded from #8703)

From my experience, right now, benchmarking in Vitest feels more like an additional module that's been tacked on, but not really integrated into Vitest.

While it's been a great help in trying and comparing different algorithmic implementations, when trying to use it as a general "app performance" monitor (e.g. did a dependency update make things slower, did a code change make things load longer, etc.) falls a bit flat, sadly.

Since they're not actual tests, there's only the native tinybench setup() and teardown() methods on the benchmark itself. That means, right now, there seems to be no way of having work be done before and after each benchmark run that shouldn't count against the benchmark time (including being able to reset things between runs when dealing with globals). This seems to be a limitation in tinybench, but I'd love to set up certain fixtures that are being mutated during the run - currently, their creation time is included.

I am also somehow running into clean up issues with promises not being flushed after each run, out of memory errors, etc. When using regular tests these cases seem to be handled better? Doing any larger benchmarks has been very risky so far, so we've mostly been doing micro-benchmarking of certain features/converters/etc.

0 replies

scripthunter7 · 2025-12-08T07:19:27Z

scripthunter7
Dec 8, 2025

One thing I'd love to see in this redesign is a clear story for running benchmarks in Browser Mode as well.

Vitest 4 ships Browser Mode as stable, and it's already used to catch differences between simulated environments (jsdom/happy-dom) and real browser engines (Chromium, WebKit, etc.). However, the proposal here seems focused purely on the Node + Vite SSR side of things, and doesn't really mention how bench is expected to behave when tests are running in a real browser.

In practice you can already combine vitest bench with --browser.* flags, but from experience this area still feels quite fragile, and there's no "first-class" guidance on how to do browser-level benchmarking.

For some use cases, this is critical. For example, TextEncoder / TextDecoder performance can differ significantly between engines (Chromium vs WebKit), and you only see those differences when you actually run in real browsers. Having multi-layer benchmarks (Node, Vite SSR, Browser Mode with multiple engines) would make it much easier to spot these regressions.

Concretely, it would be great if:

benchmark config could be officially supported together with test.browser (e.g. a project that runs *.bench.browser.ts in Browser Mode using Playwright, with multiple browser.instances like chromium and webkit).
benchmark results carried environment information (at least which browser/instance they ran in), so reporters can show and compare per-browser results.
the new onTestCaseBenchmarkStart/Result hooks and expect.bench APIs were guaranteed to work when the test project is running in Browser Mode, not only in Node.

That would make the new benchmark API a great fit not only for library authors doing micro-benchmarks in Node, but also for real-world web apps that need to track performance across multiple browsers.

Is Browser Mode support for bench something you're considering as part of this redesign, or would you prefer it to be tackled as a follow-up?

1 reply

sheremet-va Dec 22, 2025
Maintainer Author

The proposal makes bench a simple function that runs the benchmark, so it should just work with any test out of the box, including the Browser Mode. Current implementation overrides the runner, which is why it's a pain to make it work in the browser.

lveillard · 2025-12-24T10:51:54Z

lveillard
Dec 24, 2025

hello just in case it is not too late, a convenient feature for bench to be able to run on it() tests:

#5766 (comment)

TLDR:

So we have already 150 vitests we want to use as benchmark too.
We need to manually copy them, replace the it() by a bench() and then run our benchmarking

While not optimal (we are including the expect... logic in the benchmark) it would be really convinient to be able to vitest bench the same files as vitest run without modifications.

and it could even ignore the expect lines by default

thanks for considering this!

1 reply

sheremet-va Dec 24, 2025
Maintainer Author

Please, make sure you understand the proposal. The proposed bench function runs inside test/it, it doesn't replace it anymore.

Uh oh!

Benchmarking Support #7850

Uh oh!

sheremet-va Apr 17, 2025 Maintainer

bench is not a test

comparing different results

Replies: 6 comments · 6 replies

Uh oh!

General concerns

Expectations

Literal values

Practicality

Uh oh!

sheremet-va Apr 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sheremet-va Apr 18, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sheremet-va Dec 22, 2025 Maintainer Author

Uh oh!

Uh oh!

sheremet-va Dec 24, 2025 Maintainer Author

sheremet-va
Apr 17, 2025
Maintainer

`bench` is not a `test`

Replies: 6 comments 6 replies

sheremet-va Apr 18, 2025
Maintainer Author

sheremet-va Apr 18, 2025
Maintainer Author

sheremet-va Dec 22, 2025
Maintainer Author

sheremet-va Dec 24, 2025
Maintainer Author