Skip to content

ci: add regression gate to CLI startup benchmarks#20678

Draft
zzstoatzz wants to merge 1 commit intomainfrom
cli-bench/regression-gate
Draft

ci: add regression gate to CLI startup benchmarks#20678
zzstoatzz wants to merge 1 commit intomainfrom
cli-bench/regression-gate

Conversation

@zzstoatzz
Copy link
Collaborator

Summary

  • The CLI benchmarks job was report-only — a startup regression of any magnitude would pass silently, with results only visible in the step summary
  • Adds a Welch's t-test regression check that fails the job when any startup command regresses more than 15% with statistical significance (p < 0.05)
  • Shows new commands from head that don't exist on base as "(new)" instead of silently dropping them
  • Uploads artifacts even on failure (if: always()) so benchmark data is always inspectable

Test plan

  • Verify the benchmark job runs on this PR (it triggers on changes to benchmarks.yaml)
  • Check the step summary shows the results table
  • Confirm the job passes (no real regressions since this is just a CI change)

🤖 Generated with Claude Code

The CLI benchmarks job was report-only — a 50% startup regression
would pass silently. Add a Welch's t-test regression check that
fails the job when any command regresses more than 15% with
statistical significance (p < 0.05).

Also: upload artifacts even on failure so data is always inspectable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant