Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,76 @@ jobs:

- name: Build
run: npm run build

performance-budget:
runs-on: ubuntu-latest
needs:
- frontend-tests
env:
PERF_BUDGET_HEADLESS: 'true'
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8

- name: Set up Node.js
uses: actions/setup-node@0a44ba78451273a1ed8ac2fee4e347c72dfd377f
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: ./frontend/package-lock.json

- name: Install dependencies
working-directory: ./frontend
run: npm ci

- name: Start application stack
run: |
docker compose -f docker-compose.dev.yml up -d --build

- name: Wait for API
run: |
for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done

- name: Wait for Frontend
run: |
for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done

- name: Run performance budget checks
working-directory: ./frontend
env:
PERF_BUDGET_OUTPUT_DIR: ../test-results/perf
run: npm run perf:budget

- name: Upload performance budget report
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
with:
name: perf-budget
path: test-results/perf

- name: Shutdown stack
if: always()
run: |
docker compose -f docker-compose.dev.yml down
Comment on lines +132 to +154
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make readiness waits fail when services never come up

Both wait loops (Wait for API and Wait for Frontend) end with curl … && break || sleep 2. If the service never becomes ready, the final sleep returns exit code 0, so the whole step passes and the pipeline moves on with an unhealthy stack. Please fail fast once the retries are exhausted.

       - name: Wait for API
         run: |
-          for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done
+          for i in {1..60}; do
+            if curl -sf http://localhost:8000/healthcheck; then
+              exit 0
+            fi
+            sleep 2
+          done
+          echo "API did not become ready in time" >&2
+          exit 1
@@
       - name: Wait for Frontend
         run: |
-          for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done
+          for i in {1..60}; do
+            if curl -sf http://localhost:3000/models/manifest.json; then
+              exit 0
+            fi
+            sleep 2
+          done
+          echo "Frontend did not become ready in time" >&2
+          exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in {1..60}; do curl -sf http://localhost:8000/healthcheck && break || sleep 2; done
- name: Wait for Frontend
run: |
for i in {1..60}; do curl -sf http://localhost:3000/models/manifest.json && break || sleep 2; done
- name: Run performance budget checks
working-directory: ./frontend
env:
PERF_BUDGET_OUTPUT_DIR: ../test-results/perf
run: npm run perf:budget
- name: Upload performance budget report
if: always()
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02
with:
name: perf-budget
path: test-results/perf
- name: Shutdown stack
if: always()
run: |
docker compose -f docker-compose.dev.yml down
- name: Wait for API
run: |
for i in {1..60}; do
if curl -sf http://localhost:8000/healthcheck; then
exit 0
fi
sleep 2
done
echo "API did not become ready in time" >&2
exit 1
- name: Wait for Frontend
run: |
for i in {1..60}; do
if curl -sf http://localhost:3000/models/manifest.json; then
exit 0
fi
sleep 2
done
echo "Frontend did not become ready in time" >&2
exit 1
🤖 Prompt for AI Agents
.github/workflows/ci.yml lines 132-154: the readiness loops for API and Frontend
use `curl … && break || sleep 2` which can return 0 even when the service never
became healthy; change the loops so they return non-zero when retries are
exhausted — implement a retry loop that attempts curl up to N times and if still
failing after the final attempt explicitly exit with a non-zero status (or check
the final curl result after the loop and `exit 1` on failure) so the CI step
fails when services never come up.


observability-budgets:
runs-on: ubuntu-latest
needs:
- performance-budget
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8

- name: Set up Python
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c
with:
python-version: '3.12'

- name: Install dependencies
run: pip install pyyaml

- name: Check observability budgets
env:
PROMETHEUS_URL: ${{ secrets.PROMETHEUS_URL }}
PROMETHEUS_BEARER_TOKEN: ${{ secrets.PROMETHEUS_BEARER_TOKEN }}
TEMPO_URL: ${{ secrets.TEMPO_URL }}
TEMPO_BEARER_TOKEN: ${{ secrets.TEMPO_BEARER_TOKEN }}
run: python tools/ci/check_observability_budgets.py --config observability-budgets.yml
39 changes: 39 additions & 0 deletions docs/release-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Release Checklist

This checklist ties together continuous integration signal, Grafana alerting, and the on-call rotation so that preview deployments are gated on healthy performance and reliability metrics.

## 1. Verify CI Observability Gates

- Check the **performance-budget** job in GitHub Actions CI. This job runs the Playwright-based budget defined in `perf-budget.yml` and publishes a JUnit report that Grafana can ingest. If it fails, fix the regression before proceeding.
- Confirm that the **observability-budgets** job has passed. It queries Prometheus and Tempo spanmetrics using `observability-budgets.yml` and fails when P95 latency or error-rate thresholds are exceeded compared to the previous day.
- Export any new failure signatures into the on-call runbook.

## 2. Review Grafana Dashboards

- Open the "Configurator Experience" dashboard and confirm the panels for:
- `ci_perf_budget_value` vs `ci_perf_budget_threshold` (pushed from the Playwright budget run).
- Prometheus latency and error-rate panels that use the same queries as the CI job.
- Ensure alert rules are configured to page the on-call engineer whenever the CI metrics breach thresholds for two consecutive runs or when runtime metrics cross the defined budgets.

## 3. Coordinate On-call Notifications

- Tag the current on-call engineer in the release Slack channel with a summary of CI and Grafana status.
- Verify PagerDuty (or the configured paging tool) has matching alerts for the Grafana rules referenced above.
- Record the acknowledgement in the release ticket.

## 4. Gate Preview Environments

- Do not promote a preview environment until:
- All CI jobs, including `performance-budget` and `observability-budgets`, pass.
- Grafana dashboards show no active alerts for the release window.
- The on-call engineer confirms readiness.
- If any alert is firing, pause the release and create an incident in the on-call tracking tool.

## 5. Final Release Sign-off

- Update the release ticket with links to:
- The successful CI run.
- Grafana dashboard screenshots showing green status.
- PagerDuty acknowledgement (or equivalent) from the on-call engineer.
- Archive the Grafana dashboard snapshot for auditability.
- Communicate the release completion to stakeholders.
20 changes: 19 additions & 1 deletion frontend/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
"assets:validate": "python ../scripts/glb_validate.py public/models/*.glb --fail-on-warning",
"assets:manifest": "python ../scripts/gen_glb_manifest.py > public/models/manifest.json",
"assets:all": "npm run assets:gen && npm run assets:pack && npm run assets:validate && npm run assets:manifest",
"test:manifest": "vitest run --reporter=dot"
"test:manifest": "vitest run --reporter=dot",
"perf:budget": "node ./tools/perf/run-perf-budget.js"
},
"dependencies": {
"@chakra-ui/icons": "^2.1.1",
Expand Down Expand Up @@ -52,7 +53,10 @@
"ts-jest": "^29.2.5",
"ts-node": "^10.9.2",
"typescript": "^5",
"vitest": "^1.6.0"
"vitest": "^1.6.0",
"ts-node": "^10.9.2",
"js-yaml": "^4.1.0",
"xmlbuilder2": "^4.0.0"
},
"jest": {
"setupFilesAfterEnv": [
Expand Down
Loading
Loading