Proposal: Add benchmarking via TerminalBench to spec-kit #159
adam-paterson
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
This would be amazing!! It would be good to understand how much practical improvement spec-kit gives |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Add first-class benchmarking to spec-kit by integrating TerminalBench. This enables contributors and maintainers to measure performance of key commands and workflows over time, locally and in CI, and to spot regressions early.
Motivation
Goals (initial)
spec-kit bench
subcommand that runs a curated suite.bench.config.(yml|json)
for advanced control.Non‑Goals (initial)
Proposed Approach
CLI Sketch
spec-kit bench
→ run default suitespec-kit bench --suite core
→ named suitespec-kit bench --filter <pattern>
→ subsetspec-kit bench --output results/ --format json,md
→ outputsspec-kit bench --compare results/baseline.json
→ local diffExample Config (optional)
Outputs & Signals
bench.json
: raw metrics (durations, variance, environment).bench.md
: human-readable summary with deltas vs. baseline.CI Integration
spec-kit bench --format json,md --output artifacts/
.artifacts/bench.*
; optionally comment a summary on PRs.main
publishes/updates a moving baseline (bench-baseline.json
).Alternatives Considered
Risks & Mitigations
Rollout Plan
spec-kit bench
with a minimal core suite.Future Work
Request for Feedback
Beta Was this translation helpful? Give feedback.
All reactions