Add performance budget and canary SLO checks to CI#105
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds CI stages and tooling for automated performance budgets and canary metric validation: new CI jobs, Playwright perf-budget tests, a Python canary checker with baselines, sample fixtures, a perf-budget YAML journey, and release checklist/docs updates. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI Pipeline
participant FrontendTests as Frontend Tests
participant PerfJob as perf-budget Job
participant Docker as Docker Stack
participant App as App/API
participant Playwright as Playwright Tests
participant Artifacts as Artifact Store
CI->>FrontendTests: run frontend-tests
FrontendTests-->>CI: pass
CI->>PerfJob: start perf-budget
PerfJob->>Docker: start stack
Docker->>App: launch services
PerfJob->>App: wait for readiness
App-->>PerfJob: ready
PerfJob->>Playwright: execute journeys (iterations)
Playwright->>App: navigate & interact
Playwright->>Playwright: collect timings & LCP, compute percentiles
Playwright->>Artifacts: upload perf artifacts (JSON, JUnit)
PerfJob->>Docker: teardown
PerfJob-->>CI: pass/fail
sequenceDiagram
participant CI as CI Pipeline
participant CanaryJob as canary-metrics Job
participant Checker as check_canary_metrics.py
participant Prometheus as Prometheus/Tempo
participant Baseline as canary-baseline.json
participant Output as Artifacts/Logs
CI->>CanaryJob: start canary-metrics
CanaryJob->>Checker: run with URLs or fixtures
Checker->>Prometheus: fetch metric payloads (optional)
Prometheus-->>Checker: metric payloads
Checker->>Checker: extract values, compare to thresholds & baseline
Checker->>Output: emit JSON summary & human-readable report
alt Validation fails
Checker->>CI: print annotations & exit non-zero
else Validation passes
Checker-->>CI: exit 0
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
.github/workflows/ci.yml
Outdated
| - name: Start application stack | ||
| run: | | ||
| docker compose --env-file .env.development -f docker-compose.dev.yml up -d --build | ||
|
|
There was a problem hiding this comment.
Remove hard-coded .env file from perf-budget job
The new perf-budget workflow invokes docker compose --env-file .env.development -f docker-compose.dev.yml up -d --build, but the repository does not contain .env.development, nor does any earlier step generate it. On a clean GitHub runner this command exits with “no such file or directory” before the stack even starts, so the whole job (and its dependents) will fail regardless of the code under test. Either commit the required env file or drop the --env-file flag and rely on defaults.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (11)
.github/workflows/ci.yml(1 hunks)docs/release-checklist.md(1 hunks)frontend/package.json(2 hunks)frontend/tests/e2e/perf-budget.spec.ts(1 hunks)mkdocs.yml(1 hunks)perf-budget.yml(1 hunks)tools/canary-baseline.json(1 hunks)tools/check_canary_metrics.py(1 hunks)tools/sample-data/canary/latest-results.json(1 hunks)tools/sample-data/canary/prometheus-latency.json(1 hunks)tools/sample-data/canary/tempo-error-rate.json(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/release-checklist.md
[grammar] ~1-~1: Use correct spacing
Context: # Release Checklist This checklist links continuous integrat...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~3-~3: Use correct spacing
Context: ...ll responders receive actionable alerts. ## 1. Verify CI Performance Budgets - Revi...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~5-~5: Use correct spacing
Context: ...ts. ## 1. Verify CI Performance Budgets - Review the perf-budget job in the CI w...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~7-~7: There might be a mistake here.
Context: ... the homepage ➔ configurator navigation exceeds: - P90 navigation time of 3 seconds. ...
(QB_NEW_EN_OTHER)
[grammar] ~8-~8: There might be a mistake here.
Context: ... - P90 navigation time of 3 seconds. - P95 LCP of 4 seconds. - Inspect the ...
(QB_NEW_EN)
[grammar] ~11-~11: Use correct spacing
Context: ... the offending change before proceeding. ## 2. Correlate Canary Metrics With Baselin...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~13-~13: Use correct spacing
Context: .... Correlate Canary Metrics With Baseline - Check the canary-metrics job output. I...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~16-~16: There might be a mistake here.
Context: ...orted P95 latency and error rate remain below: - 3,000 ms latency budget. - 5% er...
(QB_NEW_EN_OTHER)
[grammar] ~17-~17: There might be a mistake here.
Context: ... below: - 3,000 ms latency budget. - 5% error budget. - Review regression d...
(QB_NEW_EN)
[grammar] ~20-~20: Use correct spacing
Context: ...vability Slack channel for traceability. ## 3. Align Grafana Alerts With On-call - ...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~25-~25: There might be a mistake here.
Context: ...ards directly in the on-call runbook. - Confirm alert rules page the correct rotation t...
(QB_NEW_EN)
[grammar] ~25-~25: Insert the missing word
Context: ... on-call runbook. - Confirm alert rules page the correct rotation through PagerDuty/...
(QB_NEW_EN_OTHER_ERROR_IDS_32)
[grammar] ~29-~29: Use correct spacing
Context: ...se day by triggering a synthetic breach. ## 4. Enforce Preview Environment Gating -...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~31-~31: Use correct spacing
Context: ...## 4. Enforce Preview Environment Gating - Validate that the preview environment (s...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~40-~40: Use correct spacing
Context: ... owner (approves preview functionality). ## 5. Post-release Monitoring - Monitor Gr...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~42-~42: Use correct spacing
Context: ...onality). ## 5. Post-release Monitoring - Monitor Grafana for at least one full tr...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
🪛 Ruff (0.14.0)
tools/check_canary_metrics.py
15-15: typing.Dict is deprecated, use dict instead
(UP035)
18-18: Dynamically typed expressions (typing.Any) are disallowed in _load_json_from_url
(ANN401)
19-19: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
22-22: Avoid specifying long messages outside the exception class
(TRY003)
27-27: Dynamically typed expressions (typing.Any) are disallowed in _load_json_from_path
(ANN401)
44-44: Avoid specifying long messages outside the exception class
(TRY003)
73-73: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
tools/sample-data/canary/tempo-error-rate.json (1)
1-17: Confirm consumer handles Prometheus value typing and timestamp unitsThe fixture uses Prometheus’ instant-vector shape where value[1] is a string and the timestamp is in seconds. Ensure tools/check_canary_metrics.py parses the string to float and doesn’t assume milliseconds.
tools/sample-data/canary/latest-results.json (1)
1-12: LGTM; verify contract alignment with checkerStructure and units look consistent (ms, proportions). Please confirm tools/check_canary_metrics.py reads these exact keys and compares against baseline/regression_tolerance as expected.
perf-budget.yml (1)
44-61: Journey config looks good; ensure tests apply throttlingConfig is clear and thresholds are reasonable. Note: the Playwright spec must apply throttling (network/CPU) from this file; otherwise results will differ from intent. See test file comment for a fix.
frontend/package.json (1)
56-59: No import change needed:esModuleInteropis enabled
frontend/tsconfig.jsonsets"esModuleInterop": true, so the default import fromjs-yamlis safe.frontend/tests/e2e/perf-budget.spec.ts (1)
4-4: js-yaml import: ensure compatibilityDefault import only works if
esModuleInteroporallowSyntheticDefaultImportsis enabled in your tsconfig. If not, switch to:import * as yaml from 'js-yaml';
.github/workflows/ci.yml
Outdated
| - name: Wait for API | ||
| run: | | ||
| for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done | ||
|
|
||
| - name: Wait for Frontend | ||
| run: | | ||
| for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done | ||
|
|
||
| - name: Run performance budget checks | ||
| working-directory: ./frontend | ||
| env: | ||
| BASE_URL: http://localhost:3000 | ||
| run: npm run perf:budget | ||
|
|
||
| - name: Collect performance artifacts | ||
| if: always() | ||
| run: | | ||
| mkdir -p frontend/test-results/perf | ||
| if [ -f frontend/playwright-report/perf-budget-results.xml ]; then | ||
| mv frontend/playwright-report/perf-budget-results.xml frontend/test-results/perf/perf-budget-results.xml | ||
| fi | ||
| if [ -d frontend/test-results/perf/artifacts ]; then | ||
| find frontend/test-results/perf -maxdepth 2 -type f | ||
| fi | ||
|
|
||
| - name: Upload performance results | ||
| if: always() | ||
| uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 | ||
| with: | ||
| name: perf-budget-results | ||
| path: frontend/test-results/perf | ||
|
|
||
| - name: Shutdown stack | ||
| if: always() | ||
| run: | | ||
| docker compose --env-file .env.development -f docker-compose.dev.yml down | ||
|
|
There was a problem hiding this comment.
Fail fast when readiness loops exhaust retries.
The curl && break || sleep pattern leaves the step succeeding even when the service never becomes healthy—the final sleep returns 0 under the default -e shell, so the job keeps going against an unready stack and masks the real failure mode. Add an explicit failure path once all retries are consumed so CI surfaces the readiness regression immediately.
- name: Wait for API
run: |
- for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done
+ for i in {1..60}; do
+ if curl -sf http://localhost:8000/healthcheck; then
+ exit 0
+ fi
+ sleep 2
+ done
+ echo "API did not become ready within 120s" >&2
+ exit 1
- name: Wait for Frontend
run: |
- for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done
+ for i in {1..60}; do
+ if curl -sf http://localhost:3000/models/manifest.json; then
+ exit 0
+ fi
+ sleep 2
+ done
+ echo "Frontend did not become ready within 120s" >&2
+ exit 1🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 132 to 168: the readiness loops for API
and Frontend use `curl && break || sleep` which can exit successfully even if
the service never becomes healthy; replace each loop with a retry loop that sets
a success flag (or checks the final exit status) and, after the retries,
explicitly echo a failure message and exit 1 if the service never responded so
the step fails fast and CI surfaces the readiness regression.
frontend/package.json
Outdated
| "start": "next start", | ||
| "lint": "next lint", | ||
| "test:e2e": "playwright test", | ||
| "perf:budget": "cross-env PLAYWRIGHT_JUNIT_OUTPUT_NAME=perf-budget-results.xml playwright test tests/e2e/perf-budget.spec.ts --reporter=junit --output=./test-results/perf/artifacts", |
There was a problem hiding this comment.
JUnit file name likely ignored; set reporter outputFile in Playwright config
PLAYWRIGHT_JUNIT_OUTPUT_NAME isn’t a standard Playwright variable. Configure reporter outputFile in playwright.config.{ts,js} and drop the env var to ensure a stable artifact path.
Outside this file, set reporter:
// frontend/playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
reporter: [
['list'],
['junit', { outputFile: 'test-results/perf/perf-budget-results.xml' }],
],
outputDir: 'test-results/perf/artifacts',
});🤖 Prompt for AI Agents
In frontend/package.json around line 11, the npm script uses a non-standard
PLAYWRIGHT_JUNIT_OUTPUT_NAME env var which Playwright ignores; remove that env
var from the "perf:budget" script and instead configure the junit reporter and
output directory in frontend/playwright.config.{ts,js}: add a junit reporter
with outputFile set to "test-results/perf/perf-budget-results.xml" and set
outputDir to "test-results/perf/artifacts" so the reporter writes the stable
artifact path Playwright will actually use.
| type PerfBudget = { | ||
| run_count: number; | ||
| pages?: PageBudget[]; | ||
| journeys?: JourneyBudget[]; | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Type doesn’t include throttling; add it to reflect perf-budget.yml
Include throttling in the schema so it can be applied.
type PerfBudget = {
run_count: number;
pages?: PageBudget[];
journeys?: JourneyBudget[];
+ throttling?: {
+ cpuSlowdownMultiplier?: number;
+ downloadThroughputKbps?: number;
+ uploadThroughputKbps?: number;
+ requestLatencyMs?: number;
+ };
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type PerfBudget = { | |
| run_count: number; | |
| pages?: PageBudget[]; | |
| journeys?: JourneyBudget[]; | |
| }; | |
| type PerfBudget = { | |
| run_count: number; | |
| pages?: PageBudget[]; | |
| journeys?: JourneyBudget[]; | |
| throttling?: { | |
| cpuSlowdownMultiplier?: number; | |
| downloadThroughputKbps?: number; | |
| uploadThroughputKbps?: number; | |
| requestLatencyMs?: number; | |
| }; | |
| }; |
🤖 Prompt for AI Agents
In frontend/tests/e2e/perf-budget.spec.ts around lines 35 to 39, the PerfBudget
type is missing the throttling field defined in perf-budget.yml; add an optional
throttling property to the PerfBudget type that matches the YAML shape (for
example throttling?: { rtt?: number; throughput?: number;
cpuSlowdownMultiplier?: number } or a named Throttling interface) so tests can
apply network/CPU throttling when evaluating budgets.
| test.describe('Performance budgets', () => { | ||
| test.describe.configure({ mode: 'serial' }); | ||
|
|
There was a problem hiding this comment.
Throttling from perf-budget.yml is not applied (network/CPU); budgets won’t reflect slow-4g
The test ignores configured throttling, leading to non-representative timings. Apply DevTools throttling (Chromium) and CPU slowdown.
import { test, expect, Page } from '@playwright/test';
@@
type JourneyRunMetrics = {
navigation: number[];
'largest-contentful-paint': number[];
};
+const kbpsToBps = (kbps: number) => Math.max(1, Math.floor((kbps * 1024) / 8));
+
+async function applyThrottlingIfSupported(page: Page, t?: PerfBudget['throttling']) {
+ if (!t) return;
+ // CDP only on Chromium
+ const client = await page.context().newCDPSession(page);
+ await client.send('Network.enable');
+ const hasNet = t.downloadThroughputKbps || t.uploadThroughputKbps || t.requestLatencyMs;
+ if (hasNet) {
+ await client.send('Network.emulateNetworkConditions', {
+ offline: false,
+ downloadThroughput: t.downloadThroughputKbps ? kbpsToBps(t.downloadThroughputKbps) : -1,
+ uploadThroughput: t.uploadThroughputKbps ? kbpsToBps(t.uploadThroughputKbps) : -1,
+ latency: t.requestLatencyMs ?? 0,
+ });
+ }
+ if (t.cpuSlowdownMultiplier && t.cpuSlowdownMultiplier > 1) {
+ await client.send('Emulation.setCPUThrottlingRate', { rate: t.cpuSlowdownMultiplier });
+ }
+}
@@
for (const journey of budget.journeys ?? []) {
test(journey.id, async ({ page }, testInfo) => {
@@
- for (let run = 0; run < budget.run_count; run += 1) {
+ for (let run = 0; run < budget.run_count; run += 1) {
await page.context().clearCookies();
await page.goto('about:blank');
+ await applyThrottlingIfSupported(page, budget.throttling);Also applies to: 135-145
🤖 Prompt for AI Agents
In frontend/tests/e2e/perf-budget.spec.ts around lines 101-103 (and also apply
same change for lines 135-145), the test suite does not apply network/CPU
throttling so measured timings aren't using slow-4g; update the Playwright test
setup to enable Chromium DevTools throttling and CPU slowdown before running the
performance budget checks: obtain the test browser context/page for Chromium,
call page.context().grantPermissions if needed, use
page.emulateNetworkConditions or CDP session to set network conditions to
slow-4g and send DevTools command to set CPU throttling rate (e.g., 4x
slowdown), ensure these throttles are enabled in a beforeAll or test.before hook
and removed in afterAll or test.after to keep other tests unaffected.
| await page.addInitScript(() => { | ||
| (window as unknown as { __lcp?: number }).__lcp = 0; | ||
| try { | ||
| new PerformanceObserver((entryList) => { | ||
| const entries = entryList.getEntries(); | ||
| if (!entries.length) { | ||
| return; | ||
| } | ||
| const lastEntry = entries[entries.length - 1]; | ||
| const current = (window as unknown as { __lcp?: number }).__lcp ?? 0; | ||
| if (typeof lastEntry.startTime === 'number') { | ||
| (window as unknown as { __lcp?: number }).__lcp = Math.max(current, lastEntry.startTime); | ||
| } | ||
| }).observe({ type: 'largest-contentful-paint', buffered: true }); | ||
| } catch (error) { | ||
| console.warn('Failed to initialise LCP observer', error); | ||
| } | ||
| }); | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LCP unsupported path silently returns 0
On non-Chromium, observer may fail and yield 0, potentially masking regressions. Consider skipping LCP checks when unsupported or falling back to paint timing.
- } catch (error) {
- console.warn('Failed to initialise LCP observer', error);
- }
+ } catch {
+ // Flag unsupported instead of returning zero
+ (window as unknown as { __lcp_unsupported?: boolean }).__lcp_unsupported = true;
+ }Then, when validating:
// before using lcp
const lcpUnsupported = await page.evaluate(() => (window as any).__lcp_unsupported === true);
if (lcpUnsupported && journey.metrics.some(m => m.id === 'largest-contentful-paint')) {
test.skip(true, 'LCP not supported in this browser');
}
Summary
perf-budgetGitHub Actions job and supporting npm script that installs browsers, runs the Playwright checks, and publishes the results alongside documentation for the release checklisttools/check_canary_metrics.py, baseline/fixture data, and JSON artifacts so CI compares P95 latency and error-rate budgets against previous buildsTesting
npm run lintpython tools/check_canary_metrics.py --prometheus-fixture tools/sample-data/canary/prometheus-latency.json --tempo-fixture tools/sample-data/canary/tempo-error-rate.json --latency-threshold-ms 3000 --error-threshold 0.05 --baseline tools/canary-baseline.json --output tools/sample-data/canary/latest-results.jsonhttps://chatgpt.com/codex/tasks/task_e_68f12f59d3148330a34b5245fa2feac2
Summary by CodeRabbit
New Features
Documentation
Tests
Chores