Description
Enhance benchmarking to allow comparisons across multiple runs/commits.
Report regressions automatically.
Goals
- Store baseline benchmark results.
- Compare new runs against baseline.
- Highlight performance regressions.
Acceptance Criteria
- Benchmark comparison works across test runs.
- Regressions are clearly reported.