observability: kps + otel-collector + p95 budgets + labels#569
observability: kps + otel-collector + p95 budgets + labels#569shayancoin merged 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedThe pull request is closed. WalkthroughThis PR introduces comprehensive OpenTelemetry observability integration across the full stack: a GitHub Actions validation workflow, backend request tracing middleware, frontend page load tracing, Kubernetes Helm charts for OpenTelemetry Collector deployment, Prometheus/Grafana monitoring stack configuration, and observability SLO budgets targeting API p95 latency metrics. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client / Browser
participant Frontend as Frontend App
participant Backend as Backend API
participant Collector as OT Collector
participant Tempo as Tempo
participant Prometheus as Prometheus
participant Grafana as Grafana
Note over Frontend,Collector: Initialization & Request Phase
Frontend->>Frontend: initOtelRoute() on mount
activate Frontend
Frontend->>Frontend: Start page_load span<br/>(service, route, tenant_id)
Frontend->>Frontend: End page_load span
deactivate Frontend
Client->>Frontend: HTTP Request
Frontend->>Backend: HTTP Request + Trace Headers
Note over Backend,Collector: Backend Request Tracing
Backend->>Backend: ObservabilityMiddleware<br/>Extract trace context
activate Backend
Backend->>Backend: Create/link span<br/>(service, http.route, tenant_id)
Backend->>Backend: Process request
Backend->>Backend: End span
deactivate Backend
Backend->>Client: HTTP Response
Note over Collector,Prometheus: Telemetry Pipeline
Backend->>Collector: Send traces (OTLP)
Collector->>Collector: spanmetrics connector<br/>(extract service, route, tenant_id)
Collector->>Tempo: Export traces
Collector->>Prometheus: Export metrics<br/>(p95 latency histogram)
Note over Prometheus,Grafana: Alerting & Visualization
Prometheus->>Prometheus: Recording rule:<br/>paform:api_p95_5m
Prometheus->>Grafana: Scrape metrics
Grafana->>Grafana: Display latency dashboard<br/>Alert on threshold breach
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Rationale: The PR spans heterogeneous components (backend middleware, frontend hooks, Kubernetes manifests, Helm charts, GitHub Actions, YAML configs) with varying logic density. The backend middleware introduces context propagation logic requiring careful review of OpenTelemetry semantics; the Helm chart templates involve multiple interdependent resources (Deployment, ConfigMap, Service, ServiceAccount, ServiceMonitor) with conditional rendering and port/secret wiring; the monitoring stack chains Prometheus rules, Grafana dashboards, and Tempo integration; observability budgets introduce SLO thresholds and query logic. While individual sections are not overly complex, the breadth and interdependencies demand multi-pass reasoning across domains. Possibly related PRs
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (18)
Comment |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @shayancoin. * #569 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/middleware/observability.py` * `frontend/src/app/configurator/page.tsx` * `frontend/src/lib/otel-route.ts`
…570) Docstrings generation was requested by @shayancoin. * #569 (comment) The following files were modified: * `backend/api/main.py` * `backend/api/middleware/observability.py` * `frontend/src/app/configurator/page.tsx` * `frontend/src/lib/otel-route.ts` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| budgets: | ||
| - name: api_p95_route_configurator | ||
| target: 0.300 | ||
| query: > | ||
| histogram_quantile(0.95, sum by (le, service, route, tenant_id)( | ||
| rate(traces_spanmetrics_latency_bucket{service="paform-api", route="/configurator"}[5m]) | ||
| )) | ||
| window: 5m | ||
| action_on_violation: fail |
There was a problem hiding this comment.
Config update disables observability budget checks
The new observability budget file now defines a budgets array without the prometheus/tempo sections or the global window/baseline keys that tools/ci/check_observability_budgets.py consumes. The Python check still expects those keys and iterates over providers named prometheus and tempo; when run against this file it prints "No observability providers configured." and exits successfully, so no thresholds are ever evaluated. This effectively turns off the SLO gate for deployments. Either keep the previous schema or update the script and workflow to read the new structure.
Useful? React with 👍 / 👎.
Summary
Testing
pnpm --dir frontend linthttps://chatgpt.com/codex/tasks/task_e_68f8aea896f483309c40ed2ebbba75c3
Summary by CodeRabbit
New Features
Documentation