Add observability dashboards and alerting config by shayancoin · Pull Request #76 · shayancoin/paform

shayancoin · 2025-10-16T17:32:20Z

Summary

add Grafana dashboards for backend golden signals, sync pipeline KPIs, frontend UX, and tracing exploration
wire dashboards into Grafana provisioning and configure alert contact points
define Prometheus alert rules for error rate, slow p95 latency, and degraded frontend LCP

Testing

not run (not applicable)

https://chatgpt.com/codex/tasks/task_e_68f12adca0c88330a336cb42934b4279

Summary by CodeRabbit

New Features
- Added multiple Grafana dashboards and provisioning for automatic updates
- Added alerting notifications (Slack, Email) and Prometheus alert rules
- Added backend request instrumentation and an observability ingest endpoint
- Added frontend LCP reporting that sends web-vitals to the backend
Tests
- Added tests validating observability ingestion and HTTP metrics
Chores
- Updated frontend dependencies and dev tooling configuration

coderabbitai · 2025-10-16T17:32:47Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Grafana provisioning (dashboards + alerting notifications), four Grafana dashboards, Prometheus alerting rules, Prometheus client metrics + middleware in the backend, a backend observability ingest route, frontend web-vitals reporting, and tests validating metrics ingestion and HTTP instrumentation.

Changes

Cohort / File(s)	Summary
Grafana provisioning & alerting config `dashboards/dashboards.yaml`, `ops/grafana/provisioning/alerting/notifications.yaml`	Added a Grafana dashboards provider config and alerting notification/contact-point policy (Slack and Email receivers) with routing rules and grouping.
Grafana dashboard JSONs `ops/grafana/provisioning/dashboards/backend_golden_signals.json`, `ops/grafana/provisioning/dashboards/frontend_ux_metrics.json`, `ops/grafana/provisioning/dashboards/sync_pipeline_kpis.json`, `ops/grafana/provisioning/dashboards/tracing_explorer.json`	Added four dashboard definitions (backend golden signals, frontend UX metrics, sync pipeline KPIs, tracing explorer) targeting Prometheus/Tempo/Loki with templating, panels, and metadata.
Prometheus alert rules `ops/prometheus/alerts.yml`	Added alert groups and rules for backend (HighErrorRate, SlowP95API) and frontend (FrontendLCPDegraded) with labels, annotations and runbook URLs.
Backend metrics, middleware, and routes `backend/api/metrics.py`, `backend/api/main.py`, `backend/api/routes_observability.py`	Introduced Prometheus metrics (counter, histogram with buckets, web_vitals_lcp), observe_* APIs, a PrometheusInstrumentationMiddleware that records per-request metrics (skips /metrics), route label resolver, and an /observability POST endpoint for ingesting LCP web-vitals.
Backend tests `backend/tests/test_observability_metrics.py`	Added tests exercising LCP ingestion, rejection of unsupported metrics, and http_requests_total increment for an example request.
Frontend web-vitals reporting & deps `frontend/src/app/reportWebVitals.ts`, `frontend/package.json`	Added client-side reportWebVitals that sends LCP to backend (beacon -> fetch fallback) with env-driven endpoint resolution; updated frontend dependencies/devDependencies (added @react-three/*, ts-node adjustments).

Sequence Diagram(s)

sequenceDiagram
  participant Browser
  participant Frontend as Next.js app
  participant Backend as FastAPI app
  participant PromRegistry as Prometheus REGISTRY
  Note over Browser,Frontend #D3E4CD: LCP observed in browser
  Browser->>Frontend: reportWebVitals(LCP) (sendBeacon / fetch)
  Frontend->>Backend: POST /observability/web-vitals {name:"LCP", value, app}
  Backend->>PromRegistry: observe_lcp(app, seconds)
  PromRegistry-->>Backend: updated histogram
  Note over Browser,Backend #F7E9D9: HTTP requests instrumented
  Browser->>Backend: GET /healthcheck
  Backend->>Backend: PrometheusInstrumentationMiddleware measures duration/status
  Backend->>PromRegistry: observe_http_request(service, route, method, status, duration)
  PromRegistry-->>Backend: updated counter & histogram

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Reason: Multiple heterogeneous changes across infra (Grafana, Prometheus), backend instrumentation/middleware, HTTP route + metrics API, frontend reporting, and tests. Review requires validating config semantics, metric names/buckets, middleware correctness (skipping /metrics, error handling), and test robustness.

Possibly related PRs

Add ts-node to frontend dev dependencies #20 — touches frontend/package.json and reintroduces ts-node in devDependencies; likely related to the frontend dependency adjustments here.
Paform #7 — introduces backend observability changes (metrics and instrumentation) similar to the backend additions in this PR.

Poem

🐰 I watched dashboards sprout where carrots once lay,

Grafana gardens hum, and metrics dance all day,
Alerts nudge my whiskers when thresholds hop high,
LCP sent by beacon beneath a blue sky,
A rabbit cheers—observability’s pie! 🥕📈

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description does not follow the repository’s required template, as it lacks the mandated “PR Type,” “Short Description,” and “Tests Added” sections and instead uses custom headings “Summary” and “Testing.” Although it outlines the changes and notes testing status, it omits the specific template headings and structure that ensure consistency across pull requests. This deviation makes it difficult to quickly identify the type of change and test coverage.	Please update the PR description to match the repository template by adding a “PR Type” section (e.g., Feature), a “Short Description” summarizing the change in one sentence, and a “Tests Added” section indicating what tests were added or updated. Ensure each section uses the exact headings from the template for consistency.
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely summarizes the primary changes of adding observability dashboards and configuring alerting, matching the main objectives of the pull request without unnecessary detail or noise. It’s a single sentence that informs reviewers at a glance about the key feature being implemented. The phrasing is specific and relevant to the actual contents of the changeset.

✨ Finishing touches

📝 Docstrings were successfully generated.

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/create-json-dashboards-and-configure-alerts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-16T17:34:35Z

ops/prometheus/alerts.yml

+      - alert: HighErrorRate
+        expr: sum(rate(http_requests_total{service="backend",status=~"5.."}[5m]))
+          /
+          sum(rate(http_requests_total{service="backend"}[5m])) > 0.05
+        for: 5m
+        labels:
+          severity: critical
+          service: backend
+          team: platform
+        annotations:
+          summary: "Backend error rate is above 5%"
+          description: "The backend service is returning 5xx responses at >5% of total traffic over the last 5 minutes. Investigate recent deploys or upstream dependencies."
+          runbook_url: "https://runbooks.example.com/backend/high-error-rate"
+      - alert: SlowP95API
+        expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="backend",route!=""}[5m])) by (route, le)) > 0.8
+        for: 10m
+        labels:
+          severity: warning
+          service: backend
+          team: platform


Alert queries target nonexistent backend metric labels

Both backend alert expressions rely on http_requests_total and http_request_duration_seconds_bucket being labelled with service and route, but the backend only registers a few custom Counter metrics and never initializes the prometheus-fastapi-instrumentator (or any middleware that would emit these series). As written, the PromQL filters will always return an empty vector and neither alert will ever fire, so operators receive no signal even if the API is failing. Consider emitting these metrics or adjusting the queries to match the labels that actually exist (e.g. job, handler).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-16T17:34:35Z

ops/prometheus/alerts.yml

+      - alert: FrontendLCPDegraded
+        expr: histogram_quantile(0.95, sum(rate(web_vitals_lcp_bucket{app="frontend"}[5m])) by (le)) > 4
+        for: 15m


Frontend LCP alert references metrics never produced

The FrontendLCPDegraded rule assumes a Prometheus histogram named web_vitals_lcp_bucket with an app label, but the repo contains no instrumentation or frontend code that exports Web Vitals metrics. Because the source metric does not exist, the alert expression evaluates to an empty result and user experience regressions will go undetected. Either add instrumentation that publishes web_vitals_lcp_bucket or update the alert to target an existing metric.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-16T17:34:39Z

ops/prometheus/alerts.yml

+      - alert: HighErrorRate
+        expr: sum(rate(http_requests_total{service="backend",status=~"5.."}[5m]))
+          /
+          sum(rate(http_requests_total{service="backend"}[5m])) > 0.05
+        for: 5m
+        labels:
+          severity: critical
+          service: backend
+          team: platform
+        annotations:
+          summary: "Backend error rate is above 5%"
+          description: "The backend service is returning 5xx responses at >5% of total traffic over the last 5 minutes. Investigate recent deploys or upstream dependencies."
+          runbook_url: "https://runbooks.example.com/backend/high-error-rate"
+      - alert: SlowP95API
+        expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="backend",route!=""}[5m])) by (route, le)) > 0.8
+        for: 10m
+        labels:
+          severity: warning
+          service: backend
+          team: platform


Alert queries target nonexistent backend metric labels

Both backend alert expressions rely on http_requests_total and http_request_duration_seconds_bucket being labelled with service and route, but the backend only registers a few custom Counter metrics and never initializes the prometheus-fastapi-instrumentator (or any middleware that would emit these series). As written, the PromQL filters will always return an empty vector and neither alert will ever fire, so operators receive no signal even if the API is failing. Consider emitting these metrics or adjusting the queries to match the labels that actually exist (e.g. job, handler).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-16T17:34:39Z

ops/prometheus/alerts.yml

+      - alert: FrontendLCPDegraded
+        expr: histogram_quantile(0.95, sum(rate(web_vitals_lcp_bucket{app="frontend"}[5m])) by (le)) > 4
+        for: 15m


Frontend LCP alert references metrics never produced

The FrontendLCPDegraded rule assumes a Prometheus histogram named web_vitals_lcp_bucket with an app label, but the repo contains no instrumentation or frontend code that exports Web Vitals metrics. Because the source metric does not exist, the alert expression evaluates to an empty result and user experience regressions will go undetected. Either add instrumentation that publishes web_vitals_lcp_bucket or update the alert to target an existing metric.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-16T17:34:41Z

ops/prometheus/alerts.yml

+      - alert: HighErrorRate
+        expr: sum(rate(http_requests_total{service="backend",status=~"5.."}[5m]))
+          /
+          sum(rate(http_requests_total{service="backend"}[5m])) > 0.05
+        for: 5m
+        labels:
+          severity: critical
+          service: backend
+          team: platform
+        annotations:
+          summary: "Backend error rate is above 5%"
+          description: "The backend service is returning 5xx responses at >5% of total traffic over the last 5 minutes. Investigate recent deploys or upstream dependencies."
+          runbook_url: "https://runbooks.example.com/backend/high-error-rate"
+      - alert: SlowP95API
+        expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{service="backend",route!=""}[5m])) by (route, le)) > 0.8
+        for: 10m
+        labels:
+          severity: warning
+          service: backend
+          team: platform


Alert queries target nonexistent backend metric labels

Both backend alert expressions rely on http_requests_total and http_request_duration_seconds_bucket being labelled with service and route, but the backend only registers a few custom Counter metrics and never initializes the prometheus-fastapi-instrumentator (or any middleware that would emit these series). As written, the PromQL filters will always return an empty vector and neither alert will ever fire, so operators receive no signal even if the API is failing. Consider emitting these metrics or adjusting the queries to match the labels that actually exist (e.g. job, handler).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-16T17:34:41Z

ops/prometheus/alerts.yml

+      - alert: FrontendLCPDegraded
+        expr: histogram_quantile(0.95, sum(rate(web_vitals_lcp_bucket{app="frontend"}[5m])) by (le)) > 4
+        for: 15m


Frontend LCP alert references metrics never produced

The FrontendLCPDegraded rule assumes a Prometheus histogram named web_vitals_lcp_bucket with an app label, but the repo contains no instrumentation or frontend code that exports Web Vitals metrics. Because the source metric does not exist, the alert expression evaluates to an empty result and user experience regressions will go undetected. Either add instrumentation that publishes web_vitals_lcp_bucket or update the alert to target an existing metric.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2a35c4 and c3fb7df.

📒 Files selected for processing (7)

dashboards/dashboards.yaml (1 hunks)
ops/grafana/provisioning/alerting/notifications.yaml (1 hunks)
ops/grafana/provisioning/dashboards/backend_golden_signals.json (1 hunks)
ops/grafana/provisioning/dashboards/frontend_ux_metrics.json (1 hunks)
ops/grafana/provisioning/dashboards/sync_pipeline_kpis.json (1 hunks)
ops/grafana/provisioning/dashboards/tracing_explorer.json (1 hunks)
ops/prometheus/alerts.yml (1 hunks)

coderabbitai · 2025-10-16T17:40:06Z

ops/grafana/provisioning/alerting/notifications.yaml

+    groupBy:
+      - alertname
+      - service
+    matchers:
+      - severity =~ ".*"
+    routes:
+      - receiver: email_sre
+        groupBy:
+          - alertname
+        objectMatchers:
+          - - severity
+            - '='
+            - warning


⚠️ Potential issue | 🟠 Major

Use Grafana’s expected snake_case keys in notification policies

Grafana provisioning only understands group_by and object_matchers. Using the camelCase variants (groupBy, objectMatchers) means your policy won’t group alerts or fan out warnings to email.

Update the block to:

- groupBy: + group_by: - alertname - service ... - - receiver: email_sre - groupBy: + - receiver: email_sre + group_by: - alertname - objectMatchers: + object_matchers: - - severity - '=' - warning

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

groupBy:

- alertname

- service

matchers:

- severity =~ ".*"

routes:

- receiver: email_sre

groupBy:

- alertname

objectMatchers:

- - severity

- '='

- warning

group_by:

- alertname

- service

matchers:

- severity =~ ".*"

routes:

- receiver: email_sre

group_by:

- alertname

object_matchers:

- - severity

- '='

- warning

🤖 Prompt for AI Agents

In ops/grafana/provisioning/alerting/notifications.yaml around lines 25 to 37, the route uses CamelCase keys that Grafana doesn't recognize; change "groupBy" to "group_by" and "objectMatchers" to "object_matchers" under the route (preserving the nested list structure of the object matchers), so the routing will correctly group alerts and fan out warnings to the email_sre receiver.

coderabbitai · 2025-10-16T17:40:06Z

ops/grafana/provisioning/dashboards/frontend_ux_metrics.json

+      "datasource": {
+        "type": "prometheus",
+        "uid": "$prometheus"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "unit": "reqps"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 8
+      },
+      "id": 4,
+      "options": {
+        "legend": {
+          "displayMode": "list",
+          "placement": "bottom"
+        },
+        "tooltip": {
+          "mode": "multi"
+        }
+      },
+      "targets": [
+        {
+          "expr": "sum(rate(frontend_api_requests_total{outcome=\"success\"}[5m])) by (route)",
+          "legendFormat": "{{route}}",
+          "refId": "A"
+        }
+      ],
+      "title": "Frontend API Success Rate",
+      "type": "timeseries"
+    },


⚠️ Potential issue | 🟠 Major

Fix the “Frontend API Success Rate” query

This panel is titled as a success rate but the query only counts successful requests per second, so we never notice when the success percentage drops. Please turn it into a true ratio and report it as a percentage.

Apply:

- "unit": "reqps" + "unit": "percentunit" ... - "expr": "sum(rate(frontend_api_requests_total{outcome=\"success\"}[5m])) by (route)", + "expr": "sum(rate(frontend_api_requests_total{outcome=\"success\"}[5m])) by (route)\n /\n clamp_min(sum(rate(frontend_api_requests_total[5m])) by (route), 1e-6)",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"datasource": {

"type": "prometheus",

"uid": "$prometheus"

},

"fieldConfig": {

"defaults": {

"color": {

"mode": "palette-classic"

},

"unit": "reqps"

},

"overrides": []

},

"gridPos": {

"h": 8,

"w": 12,

"x": 0,

"y": 8

},

"id": 4,

"options": {

"legend": {

"displayMode": "list",

"placement": "bottom"

},

"tooltip": {

"mode": "multi"

}

},

"targets": [

{

"expr": "sum(rate(frontend_api_requests_total{outcome=\"success\"}[5m])) by (route)",

"legendFormat": "{{route}}",

"refId": "A"

}

],

"title": "Frontend API Success Rate",

"type": "timeseries"

},

"datasource": {

"type": "prometheus",

"uid": "$prometheus"

},

"fieldConfig": {

"defaults": {

"color": {

"mode": "palette-classic"

},

"unit": "percentunit"

},

"overrides": []

},

"gridPos": {

"h": 8,

"w": 12,

"x": 0,

"y": 8

},

"id": 4,

"options": {

"legend": {

"displayMode": "list",

"placement": "bottom"

},

"tooltip": {

"mode": "multi"

}

},

"targets": [

{

"expr": "sum(rate(frontend_api_requests_total{outcome=\"success\"}[5m])) by (route)\n /\n clamp_min(sum(rate(frontend_api_requests_total[5m])) by (route), 1e-6)",

"legendFormat": "{{route}}",

"refId": "A"

}

],

"title": "Frontend API Success Rate",

"type": "timeseries"

},

🤖 Prompt for AI Agents

In ops/grafana/provisioning/dashboards/frontend_ux_metrics.json around lines 187 to 225, the panel titled "Frontend API Success Rate" currently uses a query that only counts successful requests/sec; replace the expr with a PromQL ratio that divides successful requests by total requests over the same window (e.g. sum(rate(frontend_api_requests_total{outcome="success"}[5m])) / sum(rate(frontend_api_requests_total[5m])) ), multiply by 100 or set the panel unit to percent so the panel shows percentage values, and ensure legend/legendFormat remains appropriate (e.g. {{route}}) while keeping the datasource/type unchanged.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3fb7df and 2ec0890.

⛔ Files ignored due to path filters (1)

frontend/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (6)

backend/api/main.py (3 hunks)
backend/api/metrics.py (2 hunks)
backend/api/routes_observability.py (1 hunks)
backend/tests/test_observability_metrics.py (1 hunks)
frontend/package.json (2 hunks)
frontend/src/app/reportWebVitals.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

backend/api/routes_observability.py (2)

backend/api/main.py (1)

metrics (150-152)

backend/api/metrics.py (1)

observe_lcp (77-80)

backend/api/main.py (1)

backend/api/metrics.py (1)

observe_http_request (59-74)

🔇 Additional comments (1)

frontend/package.json (1)

23-32: Dependency additions align with the new observability + 3D work

@react-three/drei/@react-three/fiber land with compatible versions alongside three, and web-vitals ^4.2.4 is the right pick for piping LCP into the new reporting hook. No issues spotted here.

coderabbitai · 2025-10-16T19:02:52Z

backend/api/main.py

+class PrometheusInstrumentationMiddleware(BaseHTTPMiddleware):
+    async def dispatch(self, request: Request, call_next):  # type: ignore[override]
+        if request.url.path == "/metrics":
+            return await call_next(request)
+
+        start = perf_counter()
+        try:
+            response = await call_next(request)
+        except Exception:
+            duration = perf_counter() - start
+            observe_http_request(
+                service="backend",
+                route=_resolve_route_label(request),
+                method=request.method,
+                status="500",
+                duration_seconds=duration,
+            )
+            raise
+
+        duration = perf_counter() - start
+        observe_http_request(
+            service="backend",
+            route=_resolve_route_label(request),
+            method=request.method,
+            status=str(response.status_code),
+            duration_seconds=duration,
+        )
+        return response


⚠️ Potential issue | 🟠 Major

Record the real status code for HTTPException responses.

call_next raises HTTPException for anticipated errors (e.g. 404, 422), but our broad except Exception block records every one of them as status 500. That skews the metrics for client errors and validation failures. Handle HTTPException separately (use its status_code) before catching generic exceptions.

Apply this diff to record accurate status codes:

-from starlette.middleware.base import BaseHTTPMiddleware +from starlette.middleware.base import BaseHTTPMiddleware +from fastapi import HTTPException @@ - try: - response = await call_next(request) - except Exception: + try: + response = await call_next(request) + except HTTPException as exc: + duration = perf_counter() - start + observe_http_request( + service="backend", + route=_resolve_route_label(request), + method=request.method, + status=str(exc.status_code), + duration_seconds=duration, + ) + raise + except Exception: duration = perf_counter() - start observe_http_request( service="backend", route=_resolve_route_label(request), method=request.method,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class PrometheusInstrumentationMiddleware(BaseHTTPMiddleware):

async def dispatch(self, request: Request, call_next): # type: ignore[override]

if request.url.path == "/metrics":

return await call_next(request)

start = perf_counter()

try:

response = await call_next(request)

except Exception:

duration = perf_counter() - start

observe_http_request(

service="backend",

route=_resolve_route_label(request),

method=request.method,

status="500",

duration_seconds=duration,

)

raise

duration = perf_counter() - start

observe_http_request(

service="backend",

route=_resolve_route_label(request),

method=request.method,

status=str(response.status_code),

duration_seconds=duration,

)

return response

from starlette.middleware.base import BaseHTTPMiddleware

from fastapi import HTTPException

class PrometheusInstrumentationMiddleware(BaseHTTPMiddleware):

async def dispatch(self, request: Request, call_next): # type: ignore[override]

if request.url.path == "/metrics":

return await call_next(request)

start = perf_counter()

try:

response = await call_next(request)

except HTTPException as exc:

duration = perf_counter() - start

observe_http_request(

service="backend",

route=_resolve_route_label(request),

method=request.method,

status=str(exc.status_code),

duration_seconds=duration,

)

raise

except Exception:

duration = perf_counter() - start

observe_http_request(

service="backend",

route=_resolve_route_label(request),

method=request.method,

status="500",

duration_seconds=duration,

)

raise

duration = perf_counter() - start

observe_http_request(

service="backend",

route=_resolve_route_label(request),

method=request.method,

status=str(response.status_code),

duration_seconds=duration,

)

return response

🤖 Prompt for AI Agents

In backend/api/main.py around lines 66 to 93, the middleware’s broad except block treats all raised HTTPException instances as 500s; update the error handling to catch fastapi.exceptions.HTTPException (or starlette.exceptions.HTTPException) separately, call observe_http_request using the exception’s status_code, then re-raise it, and keep a separate generic except Exception block that records status "500" and re-raises; ensure the /metrics bypass remains unchanged and that duration is measured before recording in both exception branches.

coderabbitai · 2025-10-16T19:02:53Z

frontend/src/app/reportWebVitals.ts

+  const protocol = process.env.NEXT_PUBLIC_BACKEND_PROTOCOL ?? window.location.protocol ?? 'http:';
+  const host = process.env.NEXT_PUBLIC_BACKEND_HOST ?? window.location.hostname;
+  const port = process.env.NEXT_PUBLIC_BACKEND_PORT ?? '';
+  const endpoint = process.env.NEXT_PUBLIC_WEB_VITALS_ENDPOINT ?? DEFAULT_ENDPOINT;
+
+  const portSegment = port ? `:${port}` : '';
+  return `${protocol}//${host}${portSegment}${endpoint}`;
+}


⚠️ Potential issue | 🟠 Major

Normalize protocol when building the reporting URL.

If NEXT_PUBLIC_BACKEND_PROTOCOL is set to common values like "https", the computed endpoint becomes https//… (missing the colon) because we directly interpolate the raw string. That breaks both sendBeacon and the fetch fallback. Please normalize the protocol (ensure it ends with :) or build the URL via new URL(...) so env-provided values work.

Apply this diff to normalize the protocol before interpolation:

- const protocol = process.env.NEXT_PUBLIC_BACKEND_PROTOCOL ?? window.location.protocol ?? 'http:'; + const protocol = process.env.NEXT_PUBLIC_BACKEND_PROTOCOL ?? window.location.protocol ?? 'http:'; + const normalizedProtocol = protocol.endsWith(':') ? protocol : `${protocol}:`; @@ - return `${protocol}//${host}${portSegment}${endpoint}`; + return `${normalizedProtocol}//${host}${portSegment}${endpoint}`;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const protocol = process.env.NEXT_PUBLIC_BACKEND_PROTOCOL ?? window.location.protocol ?? 'http:';

const host = process.env.NEXT_PUBLIC_BACKEND_HOST ?? window.location.hostname;

const port = process.env.NEXT_PUBLIC_BACKEND_PORT ?? '';

const endpoint = process.env.NEXT_PUBLIC_WEB_VITALS_ENDPOINT ?? DEFAULT_ENDPOINT;

const portSegment = port ? `:${port}` : '';

return `${protocol}//${host}${portSegment}${endpoint}`;

}

const protocol = process.env.NEXT_PUBLIC_BACKEND_PROTOCOL ?? window.location.protocol ?? 'http:';

const normalizedProtocol = protocol.endsWith(':') ? protocol : `${protocol}:`;

const host = process.env.NEXT_PUBLIC_BACKEND_HOST ?? window.location.hostname;

const port = process.env.NEXT_PUBLIC_BACKEND_PORT ?? '';

const endpoint = process.env.NEXT_PUBLIC_WEB_VITALS_ENDPOINT ?? DEFAULT_ENDPOINT;

const portSegment = port ? `:${port}` : '';

return `${normalizedProtocol}//${host}${portSegment}${endpoint}`;

}

🤖 Prompt for AI Agents

In frontend/src/app/reportWebVitals.ts around lines 11 to 18, the protocol value from NEXT_PUBLIC_BACKEND_PROTOCOL may be missing the trailing colon (e.g. "https") which produces URLs like "https//..."; normalize the protocol before building the URL by trimming whitespace, checking if it is non-empty and endsWith(':') and if not appending ':' (or alternately construct the full URL with new URL by combining host/port/endpoint), then use that normalized protocol when interpolating so the final URL is valid for sendBeacon/fetch.

coderabbitai · 2025-10-16T20:20:16Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #133

@shayancoin

Docstrings generation was requested by @shayancoin. * #76 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/metrics.py` * `backend/api/routes_observability.py` * `backend/tests/test_observability_metrics.py` * `frontend/src/app/reportWebVitals.ts`

…lerts

@shayancoin

Docstrings generation was requested by @shayancoin. * #76 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/metrics.py` * `backend/api/routes_observability.py` * `backend/tests/test_observability_metrics.py` * `frontend/src/app/reportWebVitals.ts`

@shayancoin

…s` (#132) Docstrings generation was requested by @shayancoin. * #76 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/metrics.py` * `backend/api/routes_observability.py` * `backend/tests/test_observability_metrics.py` * `frontend/src/app/reportWebVitals.ts` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

@shayancoin

…s` (#133) Docstrings generation was requested by @shayancoin. * #76 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/metrics.py` * `backend/api/routes_observability.py` * `backend/tests/test_observability_metrics.py` * `frontend/src/app/reportWebVitals.ts` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Shayan <shayan@coin.link>

Add observability dashboards and alerting config

c3fb7df

shayancoin added the codex label Oct 16, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Oct 16, 2025

View reviewed changes

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

Add backend metrics instrumentation and frontend LCP reporting

2ec0890

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Oct 16, 2025

📝 Add docstrings to codex/create-json-dashboards-and-configure-alerts #132

Merged

Merge branch 'main' into codex/create-json-dashboards-and-configure-a…

310e008

…lerts

shayancoin merged commit f6d661f into main Oct 16, 2025
2 of 7 checks passed

coderabbitai bot mentioned this pull request Oct 16, 2025

📝 Add docstrings to codex/create-json-dashboards-and-configure-alerts #133

Merged

This was referenced Oct 18, 2025

Codex/add early return for web vitals flag #417

Merged

Restore Hygraph circuit breaker metrics #450

Merged

Align CNC export endpoint with versioned exports API #521

Closed

Conversation

shayancoin commented Oct 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shayancoin commented Oct 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 16, 2025 •

edited

Loading

coderabbitai bot commented Oct 16, 2025 •

edited

Loading