[codex] add /metrics endpoint for operational alerting#38
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new operational /metrics endpoint to expose Prometheus-scrapable gauges that reflect guide freshness, scheduler failure state, and cron support—supporting alerting/monitoring alongside existing /health and /status endpoints.
Changes:
- Introduces
/metricsendpoint returning Prometheus text exposition format gauges. - Adds integration coverage to validate expected metric names/values are present.
- Documents
/metricsusage and its role in observability.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| app/server.py | Adds /metrics endpoint and emits key runtime gauges in Prometheus text format. |
| test_integration.py | Adds an integration test that scrapes /metrics and validates gauge presence/values. |
| README.md | Documents /metrics endpoint and observability guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| now = datetime.now(UTC) | ||
| state = read_state() | ||
| freshness = _guide_freshness_snapshot(now) | ||
| cron_supported, _ = _cron_support_status(os.environ.get("CRON_SCHEDULE", CRON_SCHEDULE)) |
There was a problem hiding this comment.
metrics() reads state via read_state() and then calls _guide_freshness_snapshot(now), which calls read_state() again. This doubles file I/O and can produce internally inconsistent metric values if state changes between reads. Consider passing the already-read state into the freshness calculation (e.g., allow _guide_freshness_snapshot to accept an optional state dict) or compute freshness inputs in metrics() without re-reading state.
| with TestClient(app) as client: | ||
| response = client.get("/metrics") | ||
| assert response.status_code == 200 | ||
| assert "text/plain" in response.headers.get("content-type", "") |
There was a problem hiding this comment.
This test claims to validate Prometheus text format (version=0.0.4), but it only checks that the content-type contains text/plain. To prevent regressions, assert the response Content-Type includes version=0.0.4 (and/or the exact media type parameters you intend to support).
| assert "text/plain" in response.headers.get("content-type", "") | |
| content_type = response.headers.get("content-type", "") | |
| assert "text/plain" in content_type | |
| assert "version=0.0.4" in content_type |
| ```bash | ||
| # Scrape Prometheus-style metrics | ||
| curl http://localhost:7004/metrics | ||
| ``` | ||
|
|
||
| ```text | ||
| {"status":"ok","version":"1.0.0","artifacts":"m3u xml"} | ||
| ``` |
There was a problem hiding this comment.
The README now shows a /metrics curl example immediately followed by a JSON output block. Since /metrics returns Prometheus text format (not JSON), this placement is misleading. Either add a representative /metrics text output example, or move/label the JSON snippet so it’s clearly associated with the correct endpoint.
Summary
/metricsendpoint with Prometheus text format (text/plain; version=0.0.4)downlink_last_update_age_secondsdownlink_guide_staledownlink_scheduler_consecutive_failuresdownlink_cron_supported/metricsusage and observability role in READMEValidation
python3 -m black app/server.py test_integration.pypython3 -m py_compile app/server.py test_integration.pynpm run -s lint:jsnpm run -s lint:mdNotes