|
9 | 9 | ## Features |
10 | 10 |
|
11 | 11 | - **Extensible Check System**: Modular check types (ping, with more planned) via a Registry/Factory pattern. |
| 12 | +- **Host Status Aggregation**: Each host has an aggregate status (`up`, `down`, `degraded`, `unknown`) computed from all its checks. A check must be alive and have reported within the last 5 minutes to count as healthy. |
12 | 13 | - **Ping Monitoring**: Sends ICMP Echo Requests to check host availability. |
13 | 14 | - **Latency Logging**: Uses RRD to store latency data over time. |
14 | 15 | - **Graphs Generation**: Generates historical latency graphs (15 minutes, 4 hours, 8 hours, etc.) for each host. |
@@ -101,6 +102,58 @@ Ensure the following are installed: |
101 | 102 | - **Port** (`--port`): Port on which the API and front-end are served. |
102 | 103 | - **Logging Level** (`--log-level`): Set the verbosity of logs (e.g., `debug`, `info`, `warn`, `error`, `fatal`, `panic`). |
103 | 104 |
|
| 105 | +## Host Status |
| 106 | + |
| 107 | +Each host has an aggregate status derived from all its enabled checks: |
| 108 | + |
| 109 | +| Status | Color | Meaning | |
| 110 | +| ------------ | ------ | ------------------------------------------------------------ | |
| 111 | +| **up** | Green | All checks are alive and reported within the last 5 minutes. | |
| 112 | +| **degraded** | Yellow | Some checks are healthy, others are down or stale. | |
| 113 | +| **down** | Red | All checks are down (but at least one has reported before). | |
| 114 | +| **unknown** | Gray | No checks configured, or no check has ever reported. | |
| 115 | + |
| 116 | +A check result is considered **stale** if its last successful RRD update is older than 5 minutes. Stale checks are treated the same as down checks for the purpose of host status aggregation. |
| 117 | + |
| 118 | +## API |
| 119 | + |
| 120 | +### `GET /api` |
| 121 | + |
| 122 | +Returns JSON with the status of all hosts: |
| 123 | + |
| 124 | +```json |
| 125 | +{ |
| 126 | + "google": { |
| 127 | + "address": "8.8.8.8", |
| 128 | + "status": "up", |
| 129 | + "checks": { |
| 130 | + "ping": { |
| 131 | + "alive": true, |
| 132 | + "metrics": { |
| 133 | + "latency_us": 12345 |
| 134 | + }, |
| 135 | + "lastupdate": 1700000000 |
| 136 | + } |
| 137 | + } |
| 138 | + }, |
| 139 | + "router": { |
| 140 | + "status": "unknown", |
| 141 | + "checks": {} |
| 142 | + } |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +The `status` field is one of `up`, `down`, `degraded`, or `unknown` (see [Host Status](#host-status) above). |
| 147 | + |
| 148 | +### `GET /metrics` |
| 149 | + |
| 150 | +Exposes Prometheus-formatted metrics: |
| 151 | + |
| 152 | +``` |
| 153 | +check_alive{host="google", address="8.8.8.8", check="ping"} 1 |
| 154 | +check_metric{host="google", address="8.8.8.8", check="ping", metric="latency_us"} 12345 |
| 155 | +``` |
| 156 | + |
104 | 157 | ## Data Directory Layout |
105 | 158 |
|
106 | 159 | RRD files and graph images are organized into per-host subdirectories: |
|
0 commit comments