Skip to content

feat: implement consensus-based health check for lagging validators#15

Merged
fantaclaw-ai merged 1 commit intomainfrom
feat/consensus-health-check
Feb 17, 2026
Merged

feat: implement consensus-based health check for lagging validators#15
fantaclaw-ai merged 1 commit intomainfrom
feat/consensus-health-check

Conversation

@fantaclaw-ai
Copy link
Contributor

Description

The current health check only verifies HTTP 200 OK, which allows lagging validators (e.g., during startup or sync issues) to remain in the healthy pool even if they are thousands of slots behind.

This PR introduces a consensus-based health check mechanism:

  1. Response Parsing: Modifies perform_health_check to parse the JSON-RPC response body and extract the slot number (for getSlot or getBlockHeight).
  2. Consensus Tip: Calculates the maximum slot returned by any healthy backend during a check cycle.
  3. Lag Detection: Compares each backend's slot against the max slot. If a backend lags by more than max_slot_lag (configurable, default 50), it is marked as unhealthy, even if it responds successfully.

Configuration

Added max_slot_lag to HealthCheckConfig. Default is 50 slots.

Benefits

  • Prevents routing traffic to stale nodes.
  • Automatically detects and quarantines lagging validators until they catch up.

- Add max_slot_lag config (default 50)
- Parse JSON-RPC response in health checks to extract slot number
- Calculate max slot across all backends
- Mark backends unhealthy if they lag behind max slot by > threshold
@fantaclaw-ai
Copy link
Contributor Author

🚀 Performance Benchmark (Consensus Health Check)

Ran cargo run --release --bin benchmark on the feat/consensus-health-check branch.

Hardware: Local VM environment
Concurrency: 50 clients
Duration: 10s

Metric Result Baseline (PR #14)
RPS 31,257.87 31,799.80
P50 Latency 1.58ms 1.56ms
P99 Latency 2.30ms 2.22ms
Errors 0 0

Impact Analysis:
The JSON parsing overhead in the background health check loop is negligible (<2% RPS variance). The router maintains high throughput.

@fantaclaw-ai fantaclaw-ai merged commit b836e63 into main Feb 17, 2026
1 check passed
@fantaclaw-ai fantaclaw-ai deleted the feat/consensus-health-check branch February 17, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant