@@ -90,3 +90,52 @@ Register it in your runtime (or fork and extend `AdapterMap` in `StreamingGatewa
9090- Prometheus metrics (`prometheus_client`) track message counts, connection latency, and active connections.
9191- Optional background health checks ping pooled connections (interval configured via YAML).
9292
93+ # ## Advanced Health Metrics
94+
95+ - **Message loss detection** surfaces gaps in sequence numbers and reports per-provider drop rates.
96+ - **Queue depth gauges** expose backlog in internal processing buffers.
97+ - **Bandwidth and throughput** statistics track messages per second and bytes processed.
98+ - **Data freshness** timers flag stale feeds when updates stop arriving.
99+
100+ # ## Alerting & Incident History
101+
102+ - Configurable thresholds escalate repeated warnings to errors after a defined count.
103+ - Incident history is retained in memory for post-mortem analysis and optional export.
104+ - Alert channels include structured logs and Prometheus-compatible metrics.
105+
106+ # ## Recovery & Circuit Breaking
107+
108+ - Automatic retries use exponential backoff with jitter and respect circuit-breaker timeouts.
109+ - Fallback connectors can be configured for provider outages.
110+
111+ # ## Health API
112+
113+ When enabled, an embedded REST server exposes :
114+
115+ - ` GET /health` – lightweight readiness probe.
116+ - ` GET /status` – detailed status including recent incidents.
117+ - ` GET /metrics` – Prometheus scrape endpoint.
118+
119+ # ## Configuration Example
120+
121+ ` ` ` yaml
122+ streaming_health:
123+ monitoring:
124+ enabled: true
125+ check_interval: 5
126+ thresholds:
127+ max_latency_ms: 100
128+ min_throughput_msg_per_sec: 50
129+ max_queue_depth: 5000
130+ circuit_breaker_timeout: 60
131+ alerts:
132+ enabled: true
133+ channels: ["log", "metrics"]
134+ escalation_threshold: 3
135+ api:
136+ enabled: true
137+ host: "0.0.0.0"
138+ port: 8000
139+ ` ` `
140+
141+
0 commit comments