Skip to content

Add CI performance budgets and canary metrics checks#123

Closed
shayancoin wants to merge 3 commits intomainfrom
codex/extend-ci-pipeline-with-performance-thresholds
Closed

Add CI performance budgets and canary metrics checks#123
shayancoin wants to merge 3 commits intomainfrom
codex/extend-ci-pipeline-with-performance-thresholds

Conversation

@shayancoin
Copy link
Owner

@shayancoin shayancoin commented Oct 16, 2025

Summary

  • replace the root perf budget file with a richer .perf-budget.yml that covers the homepage-to-configurator journey and direct configurator load metrics
  • add a Playwright-based runner plus CI job to execute the budgets, emit JUnit/JSON artifacts, and document the release checklist updates in MkDocs
  • introduce a canary metrics validation script with a CI job that compares Prometheus/Tempo data (or fixtures) against latency and error thresholds

Testing

  • python scripts/ci/check_canary_metrics.py

https://chatgpt.com/codex/tasks/task_e_68f135015b3083308d5c389dd9177dbb

Summary by CodeRabbit

  • New Features

    • CI now runs automated performance budgets and canary metrics checks across key user flows.
  • Documentation

    • Added a Release Checklist and site navigation entry outlining performance validation and incident response steps.
  • Tests

    • Introduced an end-to-end performance test runner and a canary metrics fixture for CI reporting.
  • Chores

    • Added perf results to ignore rules and updated frontend tooling/dependencies for performance testing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 16, 2025

Warning

Rate limit exceeded

@shayancoin has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ba2fc4 and 9c26b9b.

📒 Files selected for processing (2)
  • frontend/tests/perf/run-perf-budget.ts (1 hunks)
  • scripts/ci/check_canary_metrics.py (1 hunks)

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds CI jobs for performance budgeting and canary metrics, a Playwright-based frontend perf runner with a new .perf-budget.yml, a Python canary metrics checker with fixture fallback, docs for release checks, and related packaging and ignore updates.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/ci.yml
Added two jobs: perf-budgets (depends on backend/frontend tests; sets up Node, installs deps & Playwright browsers, starts stack, waits for readiness, seeds backend, runs perf-budget runner, uploads artifacts, publishes results, shuts down) and canary-metrics (depends on perf-budgets, runs canary metrics check script, uploads summary).
Perf config
.perf-budget.yml, perf-budget.yml
Added new .perf-budget.yml (version 2) with default throttling and two scenarios (configurator-load, homepage-to-configurator) including steps, waits, selectors, and metric thresholds; removed the previous perf-budget.yml content.
Frontend package & runner
frontend/package.json, frontend/tests/perf/run-perf-budget.ts
Added perf:budget npm script; added dependencies (@react-three/drei, @react-three/fiber, yaml, ts-node), removed gltfpack and meshoptimizer. Added TypeScript perf runner that loads YAML config, applies CDP network/CPU throttling, injects PerformanceObserver, runs scenarios, collects metrics (FCP/LCP/TBT/CLS/navigation duration), aggregates percentiles, produces JUnit + JSON reports, and fails CI on budget violations.
Backend canary check & fixture
scripts/ci/check_canary_metrics.py, tests/perf/canary-metrics.fixture.json
Added Python script to query Prometheus/Tempo (with network error handling) or fallback to fixture, evaluate P95 latency and error rate against thresholds and regressions, and write perf-results/canary-metrics-summary.json. Added fixture with sample current/previous metrics.
Docs & site nav
docs/release-checklist.md, mkdocs.yml
Added Release Checklist detailing CI perf validation, canary checks, Grafana/Tempo correlation, and sign-off workflow; added nav entry in mkdocs.yml.
Git ignore
.gitignore
Added perf-results/ to ignore list.

Sequence Diagram(s)

sequenceDiagram
    actor CI
    participant PerfJob as perf-budgets Job
    participant Stack as Stack (API & Frontend)
    participant Runner as run-perf-budget.ts
    participant Artifacts as Artifact Storage

    CI->>PerfJob: Trigger (after backend/frontend tests)
    PerfJob->>Stack: Start services
    PerfJob->>Stack: Seed backend data
    PerfJob->>PerfJob: Wait for readiness

    note right of Runner#lightblue: Load .perf-budget.yml
    loop scenarios
        PerfJob->>Runner: Execute scenario
        activate Runner
        Runner->>Stack: Navigate / perform steps
        note right of Runner#lightgreen: Apply slow-4g throttling\nInject PerformanceObserver
        Runner->>Runner: Collect metrics & aggregate
        deactivate Runner
    end

    alt budgets pass
        Runner->>Artifacts: Write JUnit + JSON (passed)
        Artifacts-->>CI: Success
    else violation
        Runner->>Artifacts: Write JUnit + JSON (failed)
        Artifacts-->>CI: Failure (non-zero)
    end

    PerfJob->>Stack: Shutdown
Loading
sequenceDiagram
    actor CI
    participant CanaryJob as canary-metrics Job
    participant Script as check_canary_metrics.py
    participant Prom as Prometheus
    participant Tempo as Tempo
    participant Artifacts as Artifact Storage

    CI->>CanaryJob: Trigger (after perf-budgets)
    CanaryJob->>Script: Invoke (env vars)

    alt Prom/Tempo reachable
        Script->>Prom: Query P95 latency & error rate
        Script->>Tempo: Query trace latency P95
        Prom-->>Script: metrics
        Tempo-->>Script: trace metrics
    else fallback
        Script->>Script: Load fixture JSON
    end

    Script->>Script: Evaluate thresholds & regressions
    alt checks pass
        Script->>Artifacts: Write summary (passed=true)
        Artifacts-->>CI: Success
    else any failure
        Script->>Artifacts: Write summary (passed=false)
        Artifacts-->>CI: Failure (non-zero)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Add performance budget configuration #21: Related perf-budget configuration changes for the configurator page (overlapping selectors, throttling and thresholds), likely touching the same performance test intent and config.

Poem

🐰 I hopped through CI with a perf-test map,

slow-4g dungarees and a tracing cap,
I watched FCP, LCP, TBT in a row,
Prometheus hummed while Tempo did show,
Perf results tucked safe in perf-results' lap.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description does not follow the repository’s required template because it omits the PR Type, Short Description, and Tests Added sections and instead uses custom headings, leaving out key structured information expected by the template. Please update the description to match the repository template by adding a “PR Type” field, a concise “Short Description,” and a “Tests Added” section so that all required template sections are present and properly formatted.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly captures the primary additions of CI performance budgets and canary metrics checks and clearly reflects the main purpose of the pull request without extra noise.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines 92 to 133
def _collect_metrics_from_services() -> Optional[CanaryMetrics]:
prom_url = os.environ.get("PROMETHEUS_URL")
latency_query = os.environ.get("PROMETHEUS_LATENCY_QUERY")
error_query = os.environ.get("PROMETHEUS_ERROR_QUERY")
previous_latency_query = os.environ.get("PROMETHEUS_PREVIOUS_LATENCY_QUERY")
previous_error_query = os.environ.get("PROMETHEUS_PREVIOUS_ERROR_QUERY")

if not (prom_url and latency_query and error_query):
return None

latency = _query_prometheus(prom_url, latency_query)
error_rate = _query_prometheus(prom_url, error_query)
previous_latency = (
_query_prometheus(prom_url, previous_latency_query)
if previous_latency_query
else None
)
previous_error = (
_query_prometheus(prom_url, previous_error_query)
if previous_error_query
else None
)

tempo_url = os.environ.get("TEMPO_URL")
trace_latency_query = os.environ.get("TEMPO_TRACE_QUERY")
trace_latency = None
if tempo_url and trace_latency_query:
try:
payload = _http_get_json(
f"{tempo_url.rstrip('/')}/api/search", {"q": trace_latency_query}
)
trace_latency = _extract_prom_value(payload)
except urllib.error.URLError as exc:
print(f"Failed to query Tempo: {exc}", file=sys.stderr)

build_tag = os.environ.get("BUILD_TAG") or os.environ.get("GITHUB_SHA")
previous_build = os.environ.get("PREVIOUS_BUILD_TAG")

return CanaryMetrics(
latency_p95_ms=float(latency) if latency is not None else 0.0,
error_rate=float(error_rate) if error_rate is not None else 0.0,
trace_latency_p95_ms=trace_latency,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail canary check when Prometheus data is unavailable

When a Prometheus/Tempo query fails or returns no samples, _query_prometheus returns None, yet _collect_metrics_from_services converts those None values to 0.0 (latency_p95_ms and error_rate). _evaluate treats those zeros as being within budget, so network outages or misconfigured queries will silently mark the canary job as passing without validating real metrics. The check should raise an error or fall back to the fixture when no data is retrieved instead of assuming zero latency/error.

Useful? React with 👍 / 👎.

Comment on lines 219 to 230
print(
"Canary metrics within thresholds. "
f"latency_p95_ms={metrics.latency_p95_ms:.2f}, "
f"error_rate={metrics.error_rate:.4f}"
)
if metrics.trace_latency_p95_ms is not None:
print(f"Tempo trace P95 latency: {metrics.trace_latency_p95_ms:.2f}ms")
if metrics.previous_latency_p95_ms is not None:
print(
"Compared to previous build: "
f"latency_p95_ms={metrics.previous_latency_p95_ms:.2f}, "
f"error_rate={metrics.previous_error_rate:.4f}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard printing previous error rate when it is absent

The success log formats metrics.previous_error_rate with :.4f whenever previous_latency_p95_ms is set, but it does not first verify that the error rate is also present. If only PROMETHEUS_PREVIOUS_LATENCY_QUERY is configured (common when tracking latency regressions but not errors), previous_error_rate remains None and this formatting raises TypeError, causing the script to exit after the metrics check already passed. Check both values before formatting or default the missing one.

Useful? React with 👍 / 👎.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2a35c4 and 9aee543.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (10)
  • .github/workflows/ci.yml (1 hunks)
  • .gitignore (1 hunks)
  • .perf-budget.yml (1 hunks)
  • docs/release-checklist.md (1 hunks)
  • frontend/package.json (3 hunks)
  • frontend/tests/perf/run-perf-budget.ts (1 hunks)
  • mkdocs.yml (1 hunks)
  • perf-budget.yml (0 hunks)
  • scripts/ci/check_canary_metrics.py (1 hunks)
  • tests/perf/canary-metrics.fixture.json (1 hunks)
💤 Files with no reviewable changes (1)
  • perf-budget.yml
🧰 Additional context used
🧬 Code graph analysis (3)
.github/workflows/ci.yml (1)
tests/perf/k6-quote-cnc.js (1)
  • data (85-128)
scripts/ci/check_canary_metrics.py (1)
backend/api/main.py (1)
  • metrics (105-107)
frontend/tests/perf/run-perf-budget.ts (2)
tests/perf/k6-quote-cnc.js (2)
  • file (107-107)
  • data (85-128)
backend/api/main.py (1)
  • metrics (105-107)
🪛 actionlint (1.7.8)
.github/workflows/ci.yml

146-146: could not parse as YAML: could not find expected ':'

(syntax-check)

🪛 LanguageTool
docs/release-checklist.md

[grammar] ~1-~1: Use correct spacing
Context: # Release Checklist This checklist connects the CI performan...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~4-~4: There might be a mistake here.
Context: ...ion releases. It is intended for release managers and on-call engineers who need ...

(QB_NEW_EN)


[grammar] ~5-~5: There might be a mistake here.
Context: ...ble flow that ties Grafana alerts, Tempo traces, and CI preview environments toge...

(QB_NEW_EN)


[grammar] ~6-~6: Use correct spacing
Context: ...s, and CI preview environments together. ## 1. Validate CI performance budgets 1. C...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~8-~8: There might be a mistake here.
Context: .... ## 1. Validate CI performance budgets 1. Confirm the Performance Budgets job ...

(QB_NEW_EN_OTHER)


[grammar] ~12-~12: Use correct spacing
Context: ...son` for the concrete values captured by Playwright (P75 configurator load, P90 n...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~13-~13: There might be a mistake here.
Context: ...load, P90 navigation duration, P95 LCP). 2. If any threshold failed, investigate bef...

(QB_NEW_EN)


[grammar] ~14-~14: There might be a mistake here.
Context: ... before promoting the release candidate: * Re-run the job against the preview envir...

(QB_NEW_EN)


[grammar] ~15-~15: Use correct spacing
Context: ...ment to determine whether the regression is deterministic or environment-specific...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~16-~16: There might be a mistake here.
Context: ...s deterministic or environment-specific. * Capture a Grafana dashboard snapshot sho...

(QB_NEW_EN)


[grammar] ~17-~17: Use correct spacing
Context: ...g the relevant Web Vitals panel and link it in the incident tracker. ## 2. Check...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~18-~18: Use correct spacing
Context: ...nd link it in the incident tracker. ## 2. Check canary latency and error regres...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~20-~20: There might be a mistake here.
Context: ...eck canary latency and error regressions 1. Verify the Canary Metrics job comple...

(QB_NEW_EN_OTHER)


[grammar] ~23-~23: Use correct spacing
Context: ...json` to compare the current build's P95 latency and error rate with the previous...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~24-~24: There might be a mistake here.
Context: ...evious build and the configured budgets. 3. If the job failed: * Acknowledge or s...

(QB_NEW_EN)


[grammar] ~25-~25: Use colons correctly
Context: ...d the configured budgets. 3. If the job failed: * Acknowledge or silence the corresponding Grafana al...

(QB_NEW_EN_OTHER_ERROR_IDS_30)


[grammar] ~26-~26: Use correct spacing
Context: ...ding Grafana alert for the service-level objective (SLO). * Escalate to the on...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~27-~27: There might be a mistake here.
Context: ... the service-level objective (SLO). * Escalate to the on-call engineer through...

(QB_NEW_EN)


[grammar] ~28-~28: Use correct spacing
Context: ...he paging integration configured for the Grafana alert rule (OpsGenie, PagerDuty,...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~29-~29: There might be a mistake here.
Context: ... alert rule (OpsGenie, PagerDuty, etc.). * Use the Tempo trace search link emitted ...

(QB_NEW_EN)


[grammar] ~30-~30: Use correct spacing
Context: ... the job (or Grafana Explore) to confirm whether elevated latency correlates with...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~31-~31: Use correct spacing
Context: ... latency correlates with specific spans. ## 3. Correlate CI results with Grafana das...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~33-~33: There might be a mistake here.
Context: ...elate CI results with Grafana dashboards 1. Publish the artifacts from the CI run (`...

(QB_NEW_EN_OTHER)


[grammar] ~35-~35: Use correct spacing
Context: ...e CI run (perf-budget-summary.json and canary-metrics-summary.json) to the sh...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~36-~36: There might be a mistake here.
Context: ...to the shared release channel or ticket. 2. Update the Grafana dashboard annotations...

(QB_NEW_EN)


[grammar] ~37-~37: Use correct spacing
Context: ...the build ID and a link to the CI run so that on-call engineers can quickly corre...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~38-~38: There might be a mistake here.
Context: ...elate spikes with the release candidate. 3. Ensure the Grafana dashboard panels for ...

(QB_NEW_EN)


[grammar] ~39-~39: Use correct spacing
Context: ...d Budgets" and "Runtime Latency" use the Pushgateway/JUnit exports for side-by-si...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~40-~40: Use correct spacing
Context: ...it exports for side-by-side comparisons. ## 4. Preview gating and release sign-off ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~42-~42: Use correct spacing
Context: ...# 4. Preview gating and release sign-off 1. Block the promotion of the preview envir...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~44-~44: Use correct spacing
Context: ...ronment to staging/production until both performance-related jobs have succeeded ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~45-~45: There might be a mistake here.
Context: ...succeeded for the release branch commit. 2. Confirm no active Grafana alerts remain ...

(QB_NEW_EN)


[grammar] ~46-~46: Use correct spacing
Context: ...for the release window. If alerts exist, document mitigations and obtain sign-off...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~47-~47: There might be a mistake here.
Context: ...he incident commander before proceeding. 3. Log the final decision in the release re...

(QB_NEW_EN)


[grammar] ~48-~48: There might be a mistake here.
Context: ...decision in the release record, linking to: * The CI run with passing performance jobs. ...

(QB_NEW_EN_OTHER)


[grammar] ~49-~49: There might be a mistake here.
Context: ...he CI run with passing performance jobs. * Grafana alert history or dashboards show...

(QB_NEW_EN)


[grammar] ~51-~51: Use correct spacing
Context: ...at justify the decision when applicable. Following this checklist guarantees that...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🪛 Ruff (0.14.0)
scripts/ci/check_canary_metrics.py

1-1: Shebang is present but file is not executable

(EXE001)


13-13: typing.Dict is deprecated, use dict instead

(UP035)


32-32: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)


41-41: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


74-74: Unnecessary mode argument

Remove mode argument

(UP015)


152-152: Boolean-typed positional argument in function definition

(FBT001)

🪛 YAMLlint (1.37.1)
.github/workflows/ci.yml

[error] 147-147: syntax error: could not find expected ':'

(syntax)

🔇 Additional comments (2)
.perf-budget.yml (1)

1-66: Perf budget config looks consistent with the runner

Defaults, throttling, waits, steps, and metric IDs match the runner. Thresholds and aggregations are valid.

Optional:

  • If you want per-scenario throttling overrides, extend ScenarioConfig to include throttling and pass it to applyThrottling.
  • Consider adding FCP to homepage-to-configurator for additional coverage.
docs/release-checklist.md (1)

1-54: Clear, actionable checklist aligned with CI artifacts

Matches job names and artifact paths. No blockers.

Comment on lines +115 to +131
- name: Set up Node.js
uses: actions/setup-node@0a44ba78451273a1ed8ac2fee4e347c72dfd377f
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: ./frontend/package-lock.json

- name: Install dependencies
working-directory: ./frontend
run: npm ci

- name: Install Playwright browsers
working-directory: ./frontend
run: npx playwright install --with-deps chromium

- name: Start stack
run: docker compose --env-file .env.development -f docker-compose.dev.yml up -d --build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Install Python before using it in perf-budgets job

The job invokes python to seed the backend but never sets up Python. On hosted runners, python may not be available as python (often python3 only), causing a hard failure.

Apply this diff to add setup-python (mirrors earlier jobs):

       - name: Set up Node.js
         uses: actions/setup-node@0a44ba78451273a1ed8ac2fee4e347c72dfd377f
         with:
           node-version: '20'
           cache: 'npm'
           cache-dependency-path: ./frontend/package-lock.json
+
+      - name: Set up Python
+        uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c
+        with:
+          python-version: '3.12'
🤖 Prompt for AI Agents
.github/workflows/ci.yml around lines 115 to 131: the perf-budgets job runs
Python to seed the backend but never installs or configures Python, which can
fail on hosted runners where only python3 is available; add a step before any
python invocation that uses actions/setup-python (e.g. actions/setup-python@v4)
with python-version: '3.x' (or a specific 3.11/3.10 as used elsewhere) so the
runner exposes the python binary, and mirror any cache/config from earlier jobs
if needed.

Comment on lines +133 to +140
- name: Wait for API
run: |
for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done

- name: Wait for Frontend
run: |
for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make readiness loops fail if services never become ready

Current for-loops can exit successfully after the final sleep even if curl never succeeds, letting the job proceed in a bad state.

Apply this diff to both API and Frontend waits:

-      - name: Wait for API
-        run: |
-          for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done
+      - name: Wait for API
+        run: |
+          for i in {1..60}; do
+            if curl -sf http://localhost:8000/healthcheck; then
+              echo "API ready"
+              exit 0
+            fi
+            sleep 2
+          done
+          echo "API failed to become ready in time" >&2
+          exit 1
@@
-      - name: Wait for Frontend
-        run: |
-          for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done
+      - name: Wait for Frontend
+        run: |
+          for i in {1..60}; do
+            if curl -sf http://localhost:3000/models/manifest.json; then
+              echo "Frontend ready"
+              exit 0
+            fi
+            sleep 2
+          done
+          echo "Frontend failed to become ready in time" >&2
+          exit 1
🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 133 to 140, the readiness for-loops
currently can complete quietly even if curl never succeeded; change each block
so the workflow fails when the service never becomes ready by adding an explicit
failure after the loop (e.g., run the curl check once more and if it still fails
echo an error and exit 1, or set a flag inside the loop and after the loop test
it and exit 1 on failure), applying this to both the API and Frontend wait
steps.

Comment on lines +145 to +185
python - <<'PY'
import json
import os
import urllib.error
import urllib.request

BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")

def post(path: str, payload: dict) -> dict:
req = urllib.request.Request(
f"{BASE_URL}{path}",
data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read().decode("utf-8"))
except urllib.error.HTTPError as exc:
detail = exc.read().decode("utf-8", "ignore")
raise SystemExit(f"Seed request failed ({exc.code}): {detail}")

material = post(
"/api/materials/",
{"name": "Walnut", "texture_url": None, "cost_per_sq_ft": 12.5},
)
material_id = material.get("id")
if not material_id:
raise SystemExit("Material creation failed; missing id")

post(
"/api/modules/",
{
"name": "Base600",
"width": 600.0,
"height": 720.0,
"depth": 580.0,
"base_price": 100.0,
"material_id": material_id,
},
)
PY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix YAML block: indent heredoc body consistently (current YAML parse error)

Lines inside the run: | block after “python - <<'PY'” are not indented, so YAML treats them as new keys. Indent the entire heredoc body consistently to satisfy YAML while preserving the script.

Apply this diff:

       - name: Seed backend for perf
         env:
           BASE_URL: http://localhost:8000
         run: |
-          python - <<'PY'
-import json
-import os
-import urllib.error
-import urllib.request
-
-BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")
-
-def post(path: str, payload: dict) -> dict:
-    req = urllib.request.Request(
-        f"{BASE_URL}{path}",
-        data=json.dumps(payload).encode("utf-8"),
-        headers={"Content-Type": "application/json"},
-    )
-    try:
-        with urllib.request.urlopen(req, timeout=10) as resp:
-            return json.loads(resp.read().decode("utf-8"))
-    except urllib.error.HTTPError as exc:
-        detail = exc.read().decode("utf-8", "ignore")
-        raise SystemExit(f"Seed request failed ({exc.code}): {detail}")
-
-material = post(
-    "/api/materials/",
-    {"name": "Walnut", "texture_url": None, "cost_per_sq_ft": 12.5},
-)
-material_id = material.get("id")
-if not material_id:
-    raise SystemExit("Material creation failed; missing id")
-
-post(
-    "/api/modules/",
-    {
-        "name": "Base600",
-        "width": 600.0,
-        "height": 720.0,
-        "depth": 580.0,
-        "base_price": 100.0,
-        "material_id": material_id,
-    },
-)
-PY
+          python - <<'PY'
+          import json
+          import os
+          import urllib.error
+          import urllib.request
+
+          BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")
+
+          def post(path: str, payload: dict) -> dict:
+              req = urllib.request.Request(
+                  f"{BASE_URL}{path}",
+                  data=json.dumps(payload).encode("utf-8"),
+                  headers={"Content-Type": "application/json"},
+              )
+              try:
+                  with urllib.request.urlopen(req, timeout=10) as resp:
+                      return json.loads(resp.read().decode("utf-8"))
+              except urllib.error.HTTPError as exc:
+                  detail = exc.read().decode("utf-8", "ignore")
+                  raise SystemExit(f"Seed request failed ({exc.code}): {detail}")
+
+          material = post(
+              "/api/materials/",
+              {"name": "Walnut", "texture_url": None, "cost_per_sq_ft": 12.5},
+          )
+          material_id = material.get("id")
+          if not material_id:
+              raise SystemExit("Material creation failed; missing id")
+
+          post(
+              "/api/modules/",
+              {
+                  "name": "Base600",
+                  "width": 600.0,
+                  "height": 720.0,
+                  "depth": 580.0,
+                  "base_price": 100.0,
+                  "material_id": material_id,
+              },
+          )
+          PY

Note: YAML will strip the common indent from the block scalar, so the shell sees the heredoc exactly as intended. The current error matches actionlint/YAMLlint output.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python - <<'PY'
import json
import os
import urllib.error
import urllib.request
BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")
def post(path: str, payload: dict) -> dict:
req = urllib.request.Request(
f"{BASE_URL}{path}",
data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read().decode("utf-8"))
except urllib.error.HTTPError as exc:
detail = exc.read().decode("utf-8", "ignore")
raise SystemExit(f"Seed request failed ({exc.code}): {detail}")
material = post(
"/api/materials/",
{"name": "Walnut", "texture_url": None, "cost_per_sq_ft": 12.5},
)
material_id = material.get("id")
if not material_id:
raise SystemExit("Material creation failed; missing id")
post(
"/api/modules/",
{
"name": "Base600",
"width": 600.0,
"height": 720.0,
"depth": 580.0,
"base_price": 100.0,
"material_id": material_id,
},
)
PY
- name: Seed backend for perf
env:
BASE_URL: http://localhost:8000
run: |
python - <<'PY'
import json
import os
import urllib.error
import urllib.request
BASE_URL = os.environ.get("BASE_URL", "http://localhost:8000")
def post(path: str, payload: dict) -> dict:
req = urllib.request.Request(
f"{BASE_URL}{path}",
data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read().decode("utf-8"))
except urllib.error.HTTPError as exc:
detail = exc.read().decode("utf-8", "ignore")
raise SystemExit(f"Seed request failed ({exc.code}): {detail}")
material = post(
"/api/materials/",
{"name": "Walnut", "texture_url": None, "cost_per_sq_ft": 12.5},
)
material_id = material.get("id")
if not material_id:
raise SystemExit("Material creation failed; missing id")
post(
"/api/modules/",
{
"name": "Base600",
"width": 600.0,
"height": 720.0,
"depth": 580.0,
"base_price": 100.0,
"material_id": material_id,
},
)
PY
🧰 Tools
🪛 actionlint (1.7.8)

146-146: could not parse as YAML: could not find expected ':'

(syntax-check)

🪛 YAMLlint (1.37.1)

[error] 147-147: syntax error: could not find expected ':'

(syntax)

🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 145 to 185 the python heredoc started
with "python - <<'PY'" is not indented, causing YAML to treat the script lines
as top-level keys; indent the entire heredoc body (every line between the
"python - <<'PY'" line and the closing "PY") to the same indentation level as
the run block content (e.g., two extra spaces) so YAML parses the block scalar
correctly, and keep the closing "PY" delimiter indented to that same level; do
not change the script content otherwise (YAML will strip common indentation when
passing the heredoc to the shell).

Comment on lines +209 to +223
canary-metrics:
runs-on: ubuntu-latest
needs:
- perf-budgets
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8

- name: Evaluate canary metrics
env:
CANARY_METRICS_FIXTURE: tests/perf/canary-metrics.fixture.json
P95_THRESHOLD_MS: '3000'
ERROR_RATE_THRESHOLD: '0.02'
REGRESSION_TOLERANCE_PCT: '0.1'
run: python scripts/ci/check_canary_metrics.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Install Python before running canary metrics script

This job runs python scripts/ci/check_canary_metrics.py without provisioning Python. Add setup-python or invoke python3 only if guaranteed to exist.

Apply this diff:

   canary-metrics:
     runs-on: ubuntu-latest
     needs:
       - perf-budgets
     steps:
       - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8
+
+      - name: Set up Python
+        uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c
+        with:
+          python-version: '3.12'
@@
-      - name: Evaluate canary metrics
+      - name: Evaluate canary metrics
         env:
           CANARY_METRICS_FIXTURE: tests/perf/canary-metrics.fixture.json
           P95_THRESHOLD_MS: '3000'
           ERROR_RATE_THRESHOLD: '0.02'
           REGRESSION_TOLERANCE_PCT: '0.1'
         run: python scripts/ci/check_canary_metrics.py
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
canary-metrics:
runs-on: ubuntu-latest
needs:
- perf-budgets
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8
- name: Evaluate canary metrics
env:
CANARY_METRICS_FIXTURE: tests/perf/canary-metrics.fixture.json
P95_THRESHOLD_MS: '3000'
ERROR_RATE_THRESHOLD: '0.02'
REGRESSION_TOLERANCE_PCT: '0.1'
run: python scripts/ci/check_canary_metrics.py
canary-metrics:
runs-on: ubuntu-latest
needs:
- perf-budgets
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8
- name: Set up Python
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c
with:
python-version: '3.12'
- name: Evaluate canary metrics
env:
CANARY_METRICS_FIXTURE: tests/perf/canary-metrics.fixture.json
P95_THRESHOLD_MS: '3000'
ERROR_RATE_THRESHOLD: '0.02'
REGRESSION_TOLERANCE_PCT: '0.1'
run: python scripts/ci/check_canary_metrics.py
🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 209 to 223, the canary-metrics job
invokes python scripts/ci/check_canary_metrics.py without ensuring Python is
available; add a step before the run that sets up Python (e.g.
actions/setup-python@v4 with a python-version like '3.x' or a specific '3.11')
so the runner has a known python binary on PATH, then keep the existing run step
(or change the run to invoke python3 explicitly if you prefer not to add the
setup action).

Comment on lines 100 to 466
const rootDir = path.resolve(__dirname, '../../..');
const configPath = path.join(rootDir, '.perf-budget.yml');
const resultsDir = path.join(rootDir, 'perf-results');

function readConfig(): PerfBudgetConfig {
const file = fs.readFileSync(configPath, 'utf-8');
const parsed = yaml.parse(file) as PerfBudgetConfig;
if (!parsed || !Array.isArray(parsed.scenarios)) {
throw new Error('Invalid perf budget configuration: missing scenarios');
}
return parsed;
}

async function applyThrottling(context: BrowserContext, page: Page, throttling?: ThrottlingConfig) {
if (!throttling) {
return;
}
const session = await context.newCDPSession(page);
if (typeof throttling.cpu_slowdown_multiplier === 'number' && throttling.cpu_slowdown_multiplier > 0) {
await session.send('Emulation.setCPUThrottlingRate', { rate: throttling.cpu_slowdown_multiplier });
}
const download = throttling.download_throughput_kbps;
const upload = throttling.upload_throughput_kbps;
const latency = throttling.request_latency_ms;
if (download || upload || latency) {
await session.send('Network.enable');
await session.send('Network.emulateNetworkConditions', {
offline: false,
latency: latency ?? 0,
downloadThroughput: download ? (download * 1024) / 8 : -1,
uploadThroughput: upload ? (upload * 1024) / 8 : -1,
});
}
}

async function setupPerformanceObservers(page: Page) {
await page.addInitScript(() => {
const globalAny = globalThis as any;
globalAny.__perfBudget = {
lcpEntries: [] as PerformanceEntry[],
cls: 0,
tbt: 0,
};

if (globalAny.PerformanceObserver) {
try {
const lcpObserver = new PerformanceObserver((entryList) => {
const entries = entryList.getEntries();
const store = globalAny.__perfBudget;
store.lcpEntries.push(...entries);
});
lcpObserver.observe({ type: 'largest-contentful-paint', buffered: true });
} catch (error) {
console.warn('LCP observer failed', error);
}

try {
const clsObserver = new PerformanceObserver((entryList) => {
const store = globalAny.__perfBudget;
for (const entry of entryList.getEntries() as any[]) {
if (!entry.hadRecentInput) {
store.cls += entry.value;
}
}
});
clsObserver.observe({ type: 'layout-shift', buffered: true });
} catch (error) {
console.warn('CLS observer failed', error);
}

try {
const longTaskObserver = new PerformanceObserver((entryList) => {
const store = globalAny.__perfBudget;
for (const entry of entryList.getEntries()) {
const blockingTime = entry.duration - 50;
if (blockingTime > 0) {
store.tbt += blockingTime;
}
}
});
longTaskObserver.observe({ type: 'longtask' });
} catch (error) {
console.warn('Long task observer failed', error);
}
}
});
}

async function performWait(page: Page, wait: WaitStep) {
if (wait.type === 'selector') {
await page.waitForSelector(wait.selector, { timeout: wait.timeout_ms ?? 30000 });
} else if (wait.type === 'networkidle') {
await page.waitForLoadState('networkidle', { timeout: wait.timeout_ms ?? 60000 });
if (wait.idle_ms && wait.idle_ms > 0) {
await page.waitForTimeout(wait.idle_ms);
}
}
}

async function performScenarioStep(page: Page, step: ScenarioStep) {
if (step.type === 'goto') {
await page.goto(step.url, { waitUntil: step.wait_until ?? 'load', timeout: 60000 });
} else if (step.type === 'wait_for_selector') {
await page.waitForSelector(step.selector, { timeout: step.timeout_ms ?? 30000 });
} else if (step.type === 'wait_for_timeout') {
await page.waitForTimeout(step.timeout_ms);
}
}

async function executeScenarioRun(
scenario: ScenarioConfig,
browser: Browser,
defaults: PerfBudgetConfig['defaults'],
): Promise<RunMetrics> {
const context = await browser.newContext();
const page = await context.newPage();
await setupPerformanceObservers(page);
await applyThrottling(context, page, defaults?.throttling);

if (scenario.steps && scenario.steps.length > 0) {
for (const step of scenario.steps) {
await performScenarioStep(page, step);
}
} else if (scenario.url) {
await page.goto(scenario.url, { waitUntil: 'load', timeout: 60000 });
if (scenario.waits) {
for (const wait of scenario.waits) {
await performWait(page, wait);
}
}
} else {
throw new Error(`Scenario ${scenario.id} missing url or steps`);
}

// Give observers time to flush buffered events
await page.waitForTimeout(1000);

const metrics = await page.evaluate(() => {
const navEntries = performance.getEntriesByType('navigation');
const paints = performance.getEntriesByName('first-contentful-paint');
const globalAny = globalThis as any;
const store = globalAny.__perfBudget || { lcpEntries: [], cls: 0, tbt: 0 };
const lcpEntries = Array.isArray(store.lcpEntries) ? store.lcpEntries : [];
const lastNav = navEntries[navEntries.length - 1] as PerformanceNavigationTiming | undefined;
const fcp = paints.length > 0 ? paints[paints.length - 1].startTime : null;
const lcpEntry = lcpEntries.length > 0 ? lcpEntries[lcpEntries.length - 1] : null;
const lcp = lcpEntry ? (lcpEntry as any).startTime ?? (lcpEntry as any).renderTime ?? (lcpEntry as any).loadTime ?? null : null;
const cls = typeof store.cls === 'number' ? store.cls : null;
const tbt = typeof store.tbt === 'number' ? store.tbt : null;

return {
'navigation-duration': lastNav ? lastNav.duration : null,
'first-contentful-paint': fcp,
'largest-contentful-paint': lcp,
'total-blocking-time': tbt,
'cumulative-layout-shift': cls,
};
});

await context.close();
return metrics;
}

function collectScenarioMetrics(metrics: RunMetrics[]): ScenarioMetrics {
const result: ScenarioMetrics = {};
for (const run of metrics) {
for (const [metricId, value] of Object.entries(run)) {
if (typeof value === 'number' && Number.isFinite(value)) {
if (!result[metricId]) {
result[metricId] = [];
}
result[metricId].push(value);
}
}
}
return result;
}

function percentile(sorted: number[], percentileValue: number): number {
if (sorted.length === 0) {
return NaN;
}
const index = (sorted.length - 1) * percentileValue;
const lower = Math.floor(index);
const upper = Math.ceil(index);
if (lower === upper) {
return sorted[lower];
}
return sorted[lower] + (sorted[upper] - sorted[lower]) * (index - lower);
}

function aggregate(values: number[], aggregation: Aggregation | string): number {
if (values.length === 0) {
return NaN;
}
const sorted = [...values].sort((a, b) => a - b);
switch (aggregation) {
case 'mean':
return values.reduce((sum, value) => sum + value, 0) / values.length;
case 'median':
case 'p50':
return percentile(sorted, 0.5);
case 'p75':
return percentile(sorted, 0.75);
case 'p90':
return percentile(sorted, 0.9);
case 'p95':
return percentile(sorted, 0.95);
default:
throw new Error(`Unsupported aggregation: ${aggregation}`);
}
}

function createJUnitReport(results: ScenarioResult[]): JUnitReport {
const suites: JUnitSuite[] = [];
let totalTests = 0;
let totalFailures = 0;

for (const scenario of results) {
const testcases: JUnitTestCase[] = [];
for (const metric of scenario.metrics) {
const testcase: JUnitTestCase = {
classname: `perf.${scenario.id}`,
name: `${scenario.id} ${metric.id}`,
time: '0',
};
if (!metric.passed) {
testcase.failure = {
message: `Budget exceeded for ${metric.id}`,
details: `Expected ${metric.aggregation} <= ${metric.threshold}${metric.unit}, observed ${metric.value.toFixed(2)}${metric.unit}`,
};
totalFailures += 1;
}
testcases.push(testcase);
totalTests += 1;
}
suites.push({
name: scenario.id,
tests: scenario.metrics.length,
failures: scenario.metrics.filter((metric) => !metric.passed).length,
time: '0',
testcases,
});
}

return {
name: 'perf-budget',
tests: totalTests,
failures: totalFailures,
time: '0',
suites,
};
}

function writeJUnitReport(report: JUnitReport) {
const xmlLines: string[] = [];
xmlLines.push('<?xml version="1.0" encoding="UTF-8"?>');
xmlLines.push(
`<testsuites name="${report.name}" tests="${report.tests}" failures="${report.failures}" time="${report.time}">`,
);
for (const suite of report.suites) {
xmlLines.push(
` <testsuite name="${suite.name}" tests="${suite.tests}" failures="${suite.failures}" time="${suite.time}">`,
);
for (const testcase of suite.testcases) {
xmlLines.push(
` <testcase classname="${testcase.classname}" name="${testcase.name}" time="${testcase.time}">`,
);
if (testcase.failure) {
xmlLines.push(
` <failure message="${escapeXml(testcase.failure.message)}">${escapeXml(testcase.failure.details)}</failure>`,
);
}
xmlLines.push(' </testcase>');
}
xmlLines.push(' </testsuite>');
}
xmlLines.push('</testsuites>');

fs.mkdirSync(resultsDir, { recursive: true });
fs.writeFileSync(path.join(resultsDir, 'perf-budget-junit.xml'), xmlLines.join('\n'), 'utf-8');
}

function escapeXml(value: string): string {
return value
.replace(/&/g, '&amp;')
.replace(/"/g, '&quot;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;');
}

function writeSummary(summary: Summary) {
fs.mkdirSync(resultsDir, { recursive: true });
fs.writeFileSync(path.join(resultsDir, 'perf-budget-summary.json'), JSON.stringify(summary, null, 2), 'utf-8');
}

async function run() {
const config = readConfig();
const defaults = config.defaults ?? {};
const browser = await chromium.launch({ headless: true });
const scenarioResults: ScenarioResult[] = [];
let overallPassed = true;

try {
for (const scenario of config.scenarios) {
const runCount = scenario.run_count ?? defaults.run_count ?? 1;
const runMetrics: RunMetrics[] = [];
for (let i = 0; i < runCount; i += 1) {
console.log(`Running scenario ${scenario.id} (${i + 1}/${runCount})`);
const metrics = await executeScenarioRun(scenario, browser, defaults);
runMetrics.push(metrics);
}
const scenarioValues = collectScenarioMetrics(runMetrics);
const aggregatedMetrics: AggregatedMetric[] = [];
for (const metricConfig of scenario.metrics) {
const values = scenarioValues[metricConfig.id];
if (!values || values.length === 0) {
console.warn(`No samples collected for ${scenario.id} metric ${metricConfig.id}`);
continue;
}
const value = aggregate(values, metricConfig.aggregation);
const passed = value <= metricConfig.threshold;
if (!passed) {
overallPassed = false;
}
aggregatedMetrics.push({
id: metricConfig.id,
aggregation: metricConfig.aggregation,
threshold: metricConfig.threshold,
unit: metricConfig.unit,
value,
passed,
samples: values,
});
console.log(
`${scenario.id} ${metricConfig.id} ${metricConfig.aggregation}: ${value.toFixed(2)}${metricConfig.unit} (threshold ${metricConfig.threshold}${metricConfig.unit})`,
);
}
scenarioResults.push({
id: scenario.id,
description: scenario.description,
metrics: aggregatedMetrics,
});
}
} finally {
await browser.close();
}

const summary: Summary = {
scenarios: scenarioResults,
passed: overallPassed,
};
writeSummary(summary);
writeJUnitReport(createJUnitReport(scenarioResults));

if (!overallPassed) {
console.error('Performance budgets failed. See perf-results/perf-budget-summary.json for details.');
process.exitCode = 1;
} else {
console.log('Performance budgets satisfied.');
}
}

run().catch((error) => {
console.error('Failed to execute performance budgets', error);
process.exitCode = 1;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Solid, deterministic perf runner; consider small improvements

  • Add optional diagnostics to aid failures (console, page errors, recent requests) before evaluating metrics. Playwright 1.56 exposes page.consoleMessages(), page.pageErrors(), page.requests() you can log on failure.
  • Allow per-scenario throttling override (extend ScenarioConfig) to model mixed flows.
  • Minor type nit: MetricConfig.aggregation already covers p75/p90/p95 via Aggregation; the union is redundant.

Based on learnings

🤖 Prompt for AI Agents
In frontend/tests/perf/run-perf-budget.ts around lines 100-466, implement three
small changes: (1) add optional diagnostics on failing scenarios by capturing
and logging page.consoleMessages(), page.pageErrors(), and page.requests() just
before evaluating metrics (or when overallPassed flips false) — collect them
from the page/context and include their recent entries in console.error so
failures include context; (2) support per-scenario throttling by extending the
runner to prefer scenario.throttling over defaults (pass scenario.throttling
into applyThrottling when executing a scenario run, and update types if needed);
(3) remove the redundant union usage for metric aggregation by ensuring
functions accept the Aggregation type (not Aggregation | string) — adjust
aggregate() signature/usage and any type annotations so aggregation is typed as
Aggregation only. Ensure no new runtime exceptions are introduced and keep
logging guarded so diagnostics only run when available.

Comment on lines 130 to 139
return CanaryMetrics(
latency_p95_ms=float(latency) if latency is not None else 0.0,
error_rate=float(error_rate) if error_rate is not None else 0.0,
trace_latency_p95_ms=trace_latency,
previous_latency_p95_ms=_float_or_none(previous_latency),
previous_error_rate=_float_or_none(previous_error),
build=build_tag,
previous_build=previous_build,
generated_at=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing metrics silently pass the check

When Prometheus queries fail or return no data, latency/error_rate stay None, we coerce them to 0.0, and _evaluate happily passes the run. That masks real outages/regressions whenever the scraper is down or a query is misconfigured. Please surface this as a failure (raise, or at least refuse to build a metrics object) instead of defaulting to zeroes so CI actually fails when telemetry is missing.

Apply this change:

-    return CanaryMetrics(
-        latency_p95_ms=float(latency) if latency is not None else 0.0,
-        error_rate=float(error_rate) if error_rate is not None else 0.0,
+    if latency is None or error_rate is None:
+        raise CanaryCheckError("Prometheus queries returned no data for latency/error rate")
+
+    return CanaryMetrics(
+        latency_p95_ms=float(latency),
+        error_rate=float(error_rate),
         trace_latency_p95_ms=trace_latency,
         previous_latency_p95_ms=_float_or_none(previous_latency),
         previous_error_rate=_float_or_none(previous_error),
         build=build_tag,
         previous_build=previous_build,
         generated_at=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return CanaryMetrics(
latency_p95_ms=float(latency) if latency is not None else 0.0,
error_rate=float(error_rate) if error_rate is not None else 0.0,
trace_latency_p95_ms=trace_latency,
previous_latency_p95_ms=_float_or_none(previous_latency),
previous_error_rate=_float_or_none(previous_error),
build=build_tag,
previous_build=previous_build,
generated_at=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
)
if latency is None or error_rate is None:
raise CanaryCheckError("Prometheus queries returned no data for latency/error rate")
return CanaryMetrics(
latency_p95_ms=float(latency),
error_rate=float(error_rate),
trace_latency_p95_ms=trace_latency,
previous_latency_p95_ms=_float_or_none(previous_latency),
previous_error_rate=_float_or_none(previous_error),
build=build_tag,
previous_build=previous_build,
generated_at=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9aee543 and 9ba2fc4.

📒 Files selected for processing (1)
  • scripts/ci/check_canary_metrics.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.0)
scripts/ci/check_canary_metrics.py

1-1: Shebang is present but file is not executable

(EXE001)


13-13: typing.Dict is deprecated, use dict instead

(UP035)


32-32: Dynamically typed expressions (typing.Any) are disallowed in value

(ANN401)


41-41: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


74-74: Unnecessary mode argument

Remove mode argument

(UP015)


160-160: Boolean-typed positional argument in function definition

(FBT001)

🔇 Additional comments (4)
scripts/ci/check_canary_metrics.py (4)

16-30: LGTM!

The data structures are well-defined and appropriate for the canary metrics validation use case.


32-71: LGTM!

The helper functions properly handle error cases and return None on failures, which enables the fallback logic. The URL open security warning (S310) is low risk here since URLs come from CI-controlled environment variables.


92-148: LGTM! Previous critical issue resolved.

The function now properly returns None when Prometheus queries fail (lines 105-111), which triggers the fixture fallback in _load_metrics. This addresses the critical issue from previous reviews where missing data would be coerced to 0.0 and incorrectly pass validation.


178-245: LGTM! Previous formatting issue resolved.

The evaluation logic correctly checks metrics against budgets and regression thresholds. The comparison printing (lines 234-242) now properly checks each optional value independently before formatting, resolving the TypeError issue from previous reviews.

@@ -0,0 +1,248 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Make the script executable.

The shebang is present but the file lacks execute permissions. Run chmod +x scripts/ci/check_canary_metrics.py to make it executable, which is standard practice for scripts with shebangs.

🧰 Tools
🪛 Ruff (0.14.0)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents
In scripts/ci/check_canary_metrics.py at line 1 the file has a shebang but is
not executable; fix by setting the execute bit on the file (e.g. run chmod +x
scripts/ci/check_canary_metrics.py) so it can be executed directly in CI and
locally.

import urllib.parse
import urllib.request
from dataclasses import dataclass
from typing import Any, Dict, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use modern type hint syntax.

typing.Dict is deprecated in Python 3.9+. Use the built-in dict type instead.

Apply this diff:

-from typing import Any, Dict, Optional
+from typing import Any, Optional

Then update line 39:

-def _http_get_json(url: str, params: Optional[Dict[str, str]] = None) -> Dict[str, Any]:
+def _http_get_json(url: str, params: Optional[dict[str, str]] = None) -> dict[str, Any]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Any, Dict, Optional
from typing import Any, Optional
def _http_get_json(url: str, params: Optional[dict[str, str]] = None) -> dict[str, Any]:
🧰 Tools
🪛 Ruff (0.14.0)

13-13: typing.Dict is deprecated, use dict instead

(UP035)

🤖 Prompt for AI Agents
In scripts/ci/check_canary_metrics.py around line 13, replace the deprecated
typing.Dict import by removing Dict from the typing import (e.g., from typing
import Any, Optional). Then on line 39 update any type annotation using
Dict[...] to use the built-in generic syntax dict[...] (for example Dict[str,
Any] -> dict[str, Any]) ensuring you keep Any/Optional from typing as needed.

Comment on lines 150 to 158
def _load_metrics() -> CanaryMetrics:
metrics = _collect_metrics_from_services()
if metrics:
return metrics

fixture_path = os.environ.get("CANARY_METRICS_FIXTURE", "tests/perf/canary-metrics.fixture.json")
print(f"Using canary metrics fixture: {fixture_path}")
return _load_fixture(fixture_path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for missing fixture file.

If the fixture file doesn't exist, _load_fixture raises FileNotFoundError with an unclear message. Add a try-except to provide a helpful error message indicating both the service connection failure and the missing fixture.

Apply this diff:

 def _load_metrics() -> CanaryMetrics:
     metrics = _collect_metrics_from_services()
     if metrics:
         return metrics
 
     fixture_path = os.environ.get("CANARY_METRICS_FIXTURE", "tests/perf/canary-metrics.fixture.json")
     print(f"Using canary metrics fixture: {fixture_path}")
-    return _load_fixture(fixture_path)
+    try:
+        return _load_fixture(fixture_path)
+    except FileNotFoundError:
+        raise CanaryCheckError(
+            f"Cannot load metrics: Prometheus queries failed and fixture not found at {fixture_path}"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _load_metrics() -> CanaryMetrics:
metrics = _collect_metrics_from_services()
if metrics:
return metrics
fixture_path = os.environ.get("CANARY_METRICS_FIXTURE", "tests/perf/canary-metrics.fixture.json")
print(f"Using canary metrics fixture: {fixture_path}")
return _load_fixture(fixture_path)
def _load_metrics() -> CanaryMetrics:
metrics = _collect_metrics_from_services()
if metrics:
return metrics
fixture_path = os.environ.get("CANARY_METRICS_FIXTURE", "tests/perf/canary-metrics.fixture.json")
print(f"Using canary metrics fixture: {fixture_path}")
try:
return _load_fixture(fixture_path)
except FileNotFoundError:
raise CanaryCheckError(
f"Cannot load metrics: Prometheus queries failed and fixture not found at {fixture_path}"
)
🤖 Prompt for AI Agents
In scripts/ci/check_canary_metrics.py around lines 150 to 158, _load_metrics
falls back to loading a fixture but doesn't handle the case where the fixture
file is missing; wrap the call to _load_fixture(fixture_path) in a try/except
that catches FileNotFoundError and re-raises a clearer error (or exit) that
states the service metrics collection failed and the fixture file at
fixture_path was not found, including the fixture_path in the message and an
actionable hint to create or point CANARY_METRICS_FIXTURE to a valid file.

Comment on lines 160 to 176
def _write_summary(metrics: CanaryMetrics, passed: bool) -> None:
summary_path = os.path.join("perf-results", "canary-metrics-summary.json")
os.makedirs(os.path.dirname(summary_path), exist_ok=True)
payload = {
"latency_p95_ms": metrics.latency_p95_ms,
"error_rate": metrics.error_rate,
"trace_latency_p95_ms": metrics.trace_latency_p95_ms,
"previous_latency_p95_ms": metrics.previous_latency_p95_ms,
"previous_error_rate": metrics.previous_error_rate,
"build": metrics.build,
"previous_build": metrics.previous_build,
"generated_at": metrics.generated_at,
"passed": passed,
}
with open(summary_path, "w", encoding="utf-8") as handle:
json.dump(payload, handle, indent=2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using a keyword argument for the boolean parameter.

The passed parameter is a boolean positional argument, which can reduce readability at call sites. Consider making it keyword-only or using an enum for clarity.

Apply this diff to make it keyword-only:

-def _write_summary(metrics: CanaryMetrics, passed: bool) -> None:
+def _write_summary(metrics: CanaryMetrics, *, passed: bool) -> None:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _write_summary(metrics: CanaryMetrics, passed: bool) -> None:
summary_path = os.path.join("perf-results", "canary-metrics-summary.json")
os.makedirs(os.path.dirname(summary_path), exist_ok=True)
payload = {
"latency_p95_ms": metrics.latency_p95_ms,
"error_rate": metrics.error_rate,
"trace_latency_p95_ms": metrics.trace_latency_p95_ms,
"previous_latency_p95_ms": metrics.previous_latency_p95_ms,
"previous_error_rate": metrics.previous_error_rate,
"build": metrics.build,
"previous_build": metrics.previous_build,
"generated_at": metrics.generated_at,
"passed": passed,
}
with open(summary_path, "w", encoding="utf-8") as handle:
json.dump(payload, handle, indent=2)
def _write_summary(metrics: CanaryMetrics, *, passed: bool) -> None:
summary_path = os.path.join("perf-results", "canary-metrics-summary.json")
os.makedirs(os.path.dirname(summary_path), exist_ok=True)
payload = {
"latency_p95_ms": metrics.latency_p95_ms,
"error_rate": metrics.error_rate,
"trace_latency_p95_ms": metrics.trace_latency_p95_ms,
"previous_latency_p95_ms": metrics.previous_latency_p95_ms,
"previous_error_rate": metrics.previous_error_rate,
"build": metrics.build,
"previous_build": metrics.previous_build,
"generated_at": metrics.generated_at,
"passed": passed,
}
with open(summary_path, "w", encoding="utf-8") as handle:
json.dump(payload, handle, indent=2)
🧰 Tools
🪛 Ruff (0.14.0)

160-160: Boolean-typed positional argument in function definition

(FBT001)

🤖 Prompt for AI Agents
In scripts/ci/check_canary_metrics.py around lines 160 to 176, the boolean
parameter passed is currently positional which harms readability at call sites;
change the function signature to make passed a keyword-only argument (e.g., def
_write_summary(metrics: CanaryMetrics, *, passed: bool) -> None) and update
every call site to call _write_summary(..., passed=some_bool). Ensure
imports/types remain valid and run tests/lint to catch any missed call sites.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 18, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #282

coderabbitai bot added a commit that referenced this pull request Oct 18, 2025
…olds`

Docstrings generation was requested by @shayancoin.

* #123 (comment)

The following files were modified:

* `frontend/tests/perf/run-perf-budget.ts`
* `scripts/ci/check_canary_metrics.py`
…olds` (#282)

Docstrings generation was requested by @shayancoin.

* #123 (comment)

The following files were modified:

* `frontend/tests/perf/run-perf-budget.ts`
* `scripts/ci/check_canary_metrics.py`

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@shayancoin shayancoin closed this Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant