Skip to content

Conversation

Shigoto-dev19
Copy link
Contributor

@Shigoto-dev19 Shigoto-dev19 commented Oct 14, 2025

Closes #2545.

How it works

The performance regression workflow in CI ensures that compile, prove, and verify times remain consistent across commits. It uses a single composite action, ./.github/actions/perf-regression, which handles both creating new performance baselines and checking current results against previous runs.

When a pull request is labeled with dump-performance, the workflow enters dump mode, generating a new perf-regression.json file that contains updated performance metrics for all registered benchmarks. This file is then uploaded as an artifact and stored for inspection (not as the official baseline). If the label is absent, the action automatically switches to check mode. In this case, it locates the most recent completed run from the branch main, downloads the latest baseline artifact, normalizes its path, and compares it against freshly collected results to detect regressions.

This process allows the baseline to evolve only when explicitly requested by maintainers while ensuring that every other PR run validates its performance against an established reference from main. All of this logic is handled internally by the composite action, which determines whether to dump or check based on the PR label, making it transparent for contributors.

Within the main checks.yml workflow, the performance regression suite is simply one entry in the test matrix. All standard tests are run as usual, while the performance regression entry invokes the composite action with mode: auto. The action runs the same run-ci-tests.sh script used by other tests but injects the correct PERF_MODE environment variable (--dump or --check) based on the mode. This keeps the testing interface unified while allowing the underlying script to branch internally.

The artifact name (perf-regression-json) and storage path (tests/perf-regression/perf-regression.json) are consistent across runs. When in dump mode, the artifact is uploaded with a 30-day retention window; when in check mode, it is fetched from the last available successful run from main.


How to use

1) Open a PR without the dump-performance label

  • The workflow runs the Performance Regression job.
  • The composite action picks check mode.
  • It searches main for the latest completed run that contains the perf-regression-json artifact.
  • The artifact is downloaded, normalized to tests/perf-regression/perf-regression.json, and performance tests are run in --check.
  • Outcome: The PR is validated against the current main baseline. No new baseline is produced.

2) Open a PR with the dump-performance label

  • The workflow runs the Performance Regression job.
  • The action switches to dump mode.
  • It runs the performance tests in --dump and uploads a new perf-regression.json artifact attached to this PR run (not to main).
  • Outcome: Reviewers can inspect the PR’s freshly dumped results, but PR artifacts are not used as the source of truth for future checks (checks always look at main).

3) Merge the PR into main

  • After merge, a push to main triggers the same performance action in mode: auto.
  • The action decides:
    • If the merged PR had the dump-performance label → dump mode on main.
      • It regenerates and uploads a new canonical baseline on main.
      • All subsequent PRs will check against this updated baseline.
    • If the merged PR did not have the label → check mode on main.
      • No new baseline is created; the existing main baseline remains the reference.
  • Outcome: Only labeled merges update the official baseline, keeping updates deliberate and controlled.

Notes

  • When the workflow runs in check mode:
    • It always pulls the most recent successful baseline artifact on main.
    • It will not use older baselines unless newer ones don’t exist or have expired.
    • It doesn’t pick randomly among past baselines — only the newest valid one.

@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch from c0aeb4a to b0568fc Compare October 14, 2025 09:55
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch from 955e6f5 to ee33eaa Compare October 14, 2025 16:50
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch 3 times, most recently from 51b3fdb to f5f1909 Compare October 15, 2025 07:52
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch 2 times, most recently from 17774b6 to c83ae06 Compare October 15, 2025 08:15
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch from c83ae06 to e512d76 Compare October 15, 2025 08:57
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch from b2ff575 to 437f588 Compare October 15, 2025 11:20
@Shigoto-dev19 Shigoto-dev19 force-pushed the shigoto/performance-regression-ci-tests branch from 437f588 to 4702ace Compare October 15, 2025 11:38
@Shigoto-dev19 Shigoto-dev19 marked this pull request as ready for review October 15, 2025 14:41
@Shigoto-dev19 Shigoto-dev19 requested review from a team as code owners October 15, 2025 14:41
Copy link
Contributor

@bleepbloopsify bleepbloopsify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actions workflow suffers from something I call "doing too much"

Actions philosophy usually goes something like this:
If you want to change how the action behaves, just use git to change what the underlying repository looks like. The action should follow "open-closed" principle, so most of the logic should live in your perf-regression.ts, rather than out here.

Adding a case to Build-and-Test-Server is a bit redundant if you introduce a separate workflow, but makes sense if you want to keep them under the same umbrella.

The pattern I expect to see here usually goes something like:

  1. Add test case to Build-and-Test-Server and run-ci-tests.sh
  2. run-ci-tests.sh calls your test or runs a bash script (check cache-regression case for an example

Your bash script should:

  1. do the download, and / or be responsible for the artifact location (looks like its tests/perf-regression/perf-regression.json).
  2. run the specific tests.

This way we don't have an action responsible for "Dump or Check", we have two locations where we simply "call" your script to allow it to be dumped properly.

suggestions:

  1. drop: this workflow. do not look for other JSON blobs that may or may not exist. only this copy of the repo exists as far as this script is concerned
  2. drop: a workflow that runs the perf-regression dumping (we have it in npm already). You might add one that runs the dump without actually dumping anything, just to make sure the flow works, but not required.
  3. procedure: if you want to update perf-regression, just run the script locally and check it into git
  4. procedure: run-ci-tests should just run the perf regression test, without having to introduce another git action step

question: how large are the perf regression JSONs? we might be able to upload them to GCP if they're quite large?

@Shigoto-dev19
Copy link
Contributor Author

question: how large are the perf regression JSONs? we might be able to upload them to GCP if they're quite large?

The final size is 1826 bytes.

@bleepbloopsify
Copy link
Contributor

question: how large are the perf regression JSONs? we might be able to upload them to GCP if they're quite large?

The final size is 1826 bytes.

well, never mind then, that is quite small

@Trivo25
Copy link
Member

Trivo25 commented Oct 16, 2025

This actions workflow suffers from something I call "doing too much"

Actions philosophy usually goes something like this: If you want to change how the action behaves, just use git to change what the underlying repository looks like. The action should follow "open-closed" principle, so most of the logic should live in your perf-regression.ts, rather than out here.

Adding a case to Build-and-Test-Server is a bit redundant if you introduce a separate workflow, but makes sense if you want to keep them under the same umbrella.

The pattern I expect to see here usually goes something like:

1. Add test case to `Build-and-Test-Server` and `run-ci-tests.sh`

2. `run-ci-tests.sh` calls your test or runs a bash script (check `cache-regression` case for an example

Your bash script should:

1. do the download, and / or be responsible for the artifact location (looks like its `tests/perf-regression/perf-regression.json`).

2. run the specific tests.

This way we don't have an action responsible for "Dump or Check", we have two locations where we simply "call" your script to allow it to be dumped properly.

suggestions:

1. drop: this workflow. do not look for other JSON blobs that may or may not exist. only this copy of the repo exists as far as this script is concerned

2. drop: a workflow that runs the perf-regression dumping (we have it in npm already). You might add one that runs the `dump` without actually dumping anything, just to make sure the flow works, but not required.

3. procedure: if you want to update `perf-regression`, just run the script locally and check it into git

4. procedure: run-ci-tests should just run the perf regression test, without having to introduce another git action step

question: how large are the perf regression JSONs? we might be able to upload them to GCP if they're quite large?

agree with the exception of

procedure: if you want to update perf-regression, just run the script locally and check it into git

because I think we will have to dump the data on the runners where we will also check the tests otherwise we might get too big of a variance

@Shigoto-dev19
Copy link
Contributor Author

agree with the exception of

procedure: if you want to update perf-regression, just run the script locally and check it into git

because I think we will have to dump the data on the runners where we will also check the tests otherwise we might get too big of a variance

I also agree with Florian, that's why I took the approach to do both dump and check in CI to eliminate notable variance from having different performance results run from different machines.

@bleepbloopsify
Copy link
Contributor

because I think we will have to dump the data on the runners where we will also check the tests otherwise we might get too big of a variance

ah of course, this makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add performance regression tests to CI

3 participants