|
| 1 | +--- |
| 2 | +name: grafana-dashboards |
| 3 | +description: Use when doing any work with Grafana dashboards - creating, modifying, debugging, or deploying. Covers the full architecture, file locations, naming conventions, and deployment pipeline. |
| 4 | +--- |
| 5 | + |
| 6 | +# Grafana Dashboards |
| 7 | + |
| 8 | +## Architecture Overview |
| 9 | + |
| 10 | +Dashboards are Grafana JSON embedded in K8s ConfigMap YAMLs, stored as classpath resources. They are extracted from the JAR, undergo template variable substitution, and are applied to K8s. Grafana loads them via file-based provisioning from mounted ConfigMap volumes. |
| 11 | + |
| 12 | +``` |
| 13 | +JAR resource YAML → TemplateService (substitutes __KEY__ vars) → kubectl apply → ConfigMap in K8s |
| 14 | + ↓ |
| 15 | +Grafana deployment mounts ConfigMap as volume → Grafana provisioner scans /var/lib/grafana/dashboards/ |
| 16 | +``` |
| 17 | + |
| 18 | +## Key Files and Locations |
| 19 | + |
| 20 | +| What | Path | |
| 21 | +|------|------| |
| 22 | +| Core dashboard YAMLs | `src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/core/` | |
| 23 | +| ClickHouse dashboard YAMLs | `src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/clickhouse/` | |
| 24 | +| Grafana deployment | `src/main/resources/.../k8s/core/41-grafana-deployment.yaml` | |
| 25 | +| Dashboard provisioner config | `src/main/resources/.../k8s/core/14-grafana-dashboards-configmap.yaml` | |
| 26 | +| Dashboard service | `src/main/kotlin/.../services/GrafanaDashboardService.kt` | |
| 27 | +| Dashboard commands | `src/main/kotlin/.../commands/dashboards/` (`DashboardsGenerate.kt`, `DashboardsUpload.kt`) | |
| 28 | +| Template substitution | `src/main/kotlin/.../services/TemplateService.kt` | |
| 29 | +| Datasource config | `src/main/kotlin/.../grafana/GrafanaDatasourceConfig.kt` | |
| 30 | +| Service test | `src/test/kotlin/.../services/GrafanaDashboardServiceTest.kt` | |
| 31 | + |
| 32 | +## Existing Dashboards |
| 33 | + |
| 34 | +| File | ConfigMap Name | Datasource | |
| 35 | +|------|---------------|------------| |
| 36 | +| `15-grafana-dashboard-system.yaml` | `grafana-dashboard-system` | VictoriaMetrics (prometheus) | |
| 37 | +| `16-grafana-dashboard-s3.yaml` | `grafana-dashboard-s3` | CloudWatch | |
| 38 | +| `17-grafana-dashboard-emr.yaml` | `grafana-dashboard-emr` | CloudWatch | |
| 39 | +| `18-grafana-dashboard-opensearch.yaml` | `grafana-dashboard-opensearch` | CloudWatch | |
| 40 | +| `19-grafana-dashboard-stress.yaml` | `grafana-dashboard-stress` | VictoriaMetrics (prometheus) | |
| 41 | +| `14-grafana-dashboard-clickhouse.yaml` (clickhouse/) | `grafana-dashboard-clickhouse` | ClickHouse | |
| 42 | +| `17-grafana-dashboard-clickhouse-logs.yaml` (clickhouse/) | `grafana-dashboard-clickhouse-logs` | VictoriaLogs | |
| 43 | + |
| 44 | +## Available Datasources |
| 45 | + |
| 46 | +| Name | `type` value | `uid` value | Port | |
| 47 | +|------|-------------|-------------|------| |
| 48 | +| VictoriaMetrics | `prometheus` | `VictoriaMetrics` | 8428 | |
| 49 | +| VictoriaLogs | `victoriametrics-logs-datasource` | `victorialogs` | 9428 | |
| 50 | +| ClickHouse | `grafana-clickhouse-datasource` | (auto) | 9000 | |
| 51 | +| Tempo | `tempo` | `tempo` | 3200 | |
| 52 | +| CloudWatch | `cloudwatch` | `cloudwatch` | N/A | |
| 53 | + |
| 54 | +Datasources are created at runtime by `GrafanaDatasourceConfig.create(region)` and applied as a ConfigMap by `GrafanaDashboardService.createDatasourcesConfigMap()`. |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Creating a New Dashboard |
| 59 | + |
| 60 | +### Step 1: Create the ConfigMap YAML |
| 61 | + |
| 62 | +**File naming:** `{ordinal}-grafana-dashboard-{name}.yaml` |
| 63 | +- Ordinal is two digits (next available after existing dashboards) |
| 64 | +- Name is kebab-case: `stress`, `emr`, `opensearch` |
| 65 | +- The filename MUST contain `grafana-dashboard` for auto-discovery |
| 66 | + |
| 67 | +**ConfigMap naming:** `grafana-dashboard-{name}` (must match the volume reference in Step 2) |
| 68 | + |
| 69 | +**Required label:** `app.kubernetes.io/name: grafana` |
| 70 | + |
| 71 | +**Template:** |
| 72 | +```yaml |
| 73 | +apiVersion: v1 |
| 74 | +kind: ConfigMap |
| 75 | +metadata: |
| 76 | + name: grafana-dashboard-{name} |
| 77 | + namespace: default |
| 78 | + labels: |
| 79 | + app.kubernetes.io/name: grafana |
| 80 | +data: |
| 81 | + {name}-overview.json: | |
| 82 | + { |
| 83 | + "annotations": { "list": [] }, |
| 84 | + "editable": true, |
| 85 | + "fiscalYearStartMonth": 0, |
| 86 | + "graphTooltip": 1, |
| 87 | + "id": null, |
| 88 | + "links": [], |
| 89 | + "liveNow": false, |
| 90 | + "panels": [], |
| 91 | + "refresh": "5m", |
| 92 | + "schemaVersion": 38, |
| 93 | + "style": "dark", |
| 94 | + "tags": [], |
| 95 | + "templating": { |
| 96 | + "list": [ |
| 97 | + { |
| 98 | + "current": { "selected": true, "text": "VictoriaMetrics", "value": "VictoriaMetrics" }, |
| 99 | + "hide": 0, |
| 100 | + "includeAll": false, |
| 101 | + "label": "Datasource", |
| 102 | + "multi": false, |
| 103 | + "name": "datasource", |
| 104 | + "options": [], |
| 105 | + "query": "prometheus", |
| 106 | + "queryValue": "", |
| 107 | + "refresh": 1, |
| 108 | + "regex": "", |
| 109 | + "skipUrlSync": false, |
| 110 | + "type": "datasource" |
| 111 | + } |
| 112 | + ] |
| 113 | + }, |
| 114 | + "time": { "from": "now-1h", "to": "now" }, |
| 115 | + "timepicker": {}, |
| 116 | + "timezone": "browser", |
| 117 | + "title": "__CLUSTER_NAME__ - {Display Name}", |
| 118 | + "uid": "{name}-overview", |
| 119 | + "version": 1, |
| 120 | + "weekStart": "" |
| 121 | + } |
| 122 | +``` |
| 123 | +
|
| 124 | +For CloudWatch dashboards, change `"query": "prometheus"` to `"query": "cloudwatch"` in the datasource variable. |
| 125 | + |
| 126 | +### Step 2: Register in Grafana Deployment |
| 127 | + |
| 128 | +**File:** `src/main/resources/.../k8s/core/41-grafana-deployment.yaml` |
| 129 | + |
| 130 | +Add **both** a volumeMount and a volume. Use `optional: true` for non-core dashboards. |
| 131 | + |
| 132 | +**volumeMount** (add after last dashboard mount, before `data`): |
| 133 | +```yaml |
| 134 | + - name: dashboard-{name} |
| 135 | + mountPath: /var/lib/grafana/dashboards/{name} |
| 136 | + readOnly: true |
| 137 | +``` |
| 138 | + |
| 139 | +**volume** (add after last dashboard volume, before `data`): |
| 140 | +```yaml |
| 141 | + - name: dashboard-{name} |
| 142 | + configMap: |
| 143 | + name: grafana-dashboard-{name} |
| 144 | + optional: true |
| 145 | +``` |
| 146 | + |
| 147 | +The volume `name` must match in both places. The `configMap.name` must match the ConfigMap metadata name from Step 1. |
| 148 | + |
| 149 | +**CRITICAL:** Without this step, the dashboard will NOT appear in Grafana. Grafana does not auto-discover ConfigMaps. Each dashboard must be explicitly mounted. |
| 150 | + |
| 151 | +### Step 3: Verify and Deploy |
| 152 | + |
| 153 | +```bash |
| 154 | +./gradlew :test |
| 155 | +# On running cluster: dashboards upload |
| 156 | +``` |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## Modifying an Existing Dashboard |
| 161 | + |
| 162 | +1. Edit the JSON inside the ConfigMap YAML directly |
| 163 | +2. Run `./gradlew :test` to verify compilation |
| 164 | +3. Deploy with `dashboards upload` |
| 165 | + |
| 166 | +The dashboard JSON is indented 4 spaces inside the YAML `|` block. Be careful with indentation — YAML block scalars are whitespace-sensitive. |
| 167 | + |
| 168 | +--- |
| 169 | + |
| 170 | +## Deployment Pipeline |
| 171 | + |
| 172 | +### `dashboards upload` command |
| 173 | + |
| 174 | +`GrafanaDashboardService.uploadDashboards()` does: |
| 175 | +1. Creates the Grafana datasources ConfigMap (with runtime AWS region) |
| 176 | +2. Extracts all classpath resources matching `"grafana-dashboard"` (dashboard ConfigMaps) |
| 177 | +3. Extracts the classpath resource matching `"grafana-deployment"` (Grafana Deployment) |
| 178 | +4. Applies all of them to K8s via `kubectl apply` |
| 179 | + |
| 180 | +This means `dashboards upload` automatically reapplies the Grafana deployment, picking up any new volume mounts. No separate `k8 apply` is needed. |
| 181 | + |
| 182 | +### `dashboards generate` command |
| 183 | + |
| 184 | +Extracts dashboard YAMLs to the local `k8s/` directory with template substitution but does NOT apply them. Useful for inspecting the substituted output. |
| 185 | + |
| 186 | +### `k8 apply` command |
| 187 | + |
| 188 | +Applies ALL `core/` resources including the Grafana deployment and dashboards. Used during initial cluster setup. |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## Template Variable Systems |
| 193 | + |
| 194 | +There are two layers of template variables: |
| 195 | + |
| 196 | +### Layer 1: Cluster Context (`__KEY__` syntax) |
| 197 | + |
| 198 | +Replaced at extraction time by `TemplateService.buildContextVariables()`: |
| 199 | + |
| 200 | +| Variable | Source | |
| 201 | +|----------|--------| |
| 202 | +| `__CLUSTER_NAME__` | `state.initConfig?.name ?: "cluster"` | |
| 203 | +| `__BUCKET_NAME__` | `state.s3Bucket ?: ""` | |
| 204 | +| `__AWS_REGION__` | `user.region` | |
| 205 | +| `__CONTROL_NODE_IP__` | `controlHost?.privateIp ?: ""` | |
| 206 | +| `__METRICS_FILTER_ID__` | Built from cluster state | |
| 207 | +| `__CLUSTER_S3_PREFIX__` | Built from cluster state | |
| 208 | + |
| 209 | +### Layer 2: Grafana Variables (`${var}` syntax) |
| 210 | + |
| 211 | +Defined in the dashboard JSON `templating.list` array. These create dropdowns in the Grafana UI. |
| 212 | + |
| 213 | +**Custom dropdown example** (quantile selector): |
| 214 | +```json |
| 215 | +{ |
| 216 | + "current": { "selected": true, "text": "p99", "value": "0.99" }, |
| 217 | + "hide": 0, |
| 218 | + "includeAll": false, |
| 219 | + "label": "Quantile", |
| 220 | + "multi": false, |
| 221 | + "name": "quantile", |
| 222 | + "options": [ |
| 223 | + { "selected": false, "text": "p50", "value": "0.5" }, |
| 224 | + { "selected": true, "text": "p99", "value": "0.99" } |
| 225 | + ], |
| 226 | + "query": "p50 : 0.5, p75 : 0.75, p95 : 0.95, p99 : 0.99", |
| 227 | + "skipUrlSync": false, |
| 228 | + "type": "custom" |
| 229 | +} |
| 230 | +``` |
| 231 | + |
| 232 | +Reference the variable in panel expressions as `$quantile` or `${quantile}`. |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Panel Patterns |
| 237 | + |
| 238 | +**Grid layout:** `gridPos` uses a 24-column grid. `w: 12` = half width, `w: 24` = full width. `h: 8` is standard panel height. `y` increases downward. |
| 239 | + |
| 240 | +**Row separator:** |
| 241 | +```json |
| 242 | +{ "collapsed": false, "gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 }, "id": 100, "title": "Section Name", "type": "row" } |
| 243 | +``` |
| 244 | + |
| 245 | +**VictoriaMetrics (Prometheus) timeseries panel:** |
| 246 | +```json |
| 247 | +{ |
| 248 | + "type": "timeseries", |
| 249 | + "title": "Panel Title", |
| 250 | + "id": 1, |
| 251 | + "datasource": { "type": "prometheus", "uid": "${datasource}" }, |
| 252 | + "gridPos": { "h": 8, "w": 12, "x": 0, "y": 1 }, |
| 253 | + "targets": [ |
| 254 | + { |
| 255 | + "datasource": { "type": "prometheus", "uid": "${datasource}" }, |
| 256 | + "expr": "rate(my_metric{job=\"my-job\"}[1m])", |
| 257 | + "legendFormat": "{{instance}}", |
| 258 | + "refId": "A" |
| 259 | + } |
| 260 | + ], |
| 261 | + "fieldConfig": { |
| 262 | + "defaults": { |
| 263 | + "unit": "ops", |
| 264 | + "color": { "mode": "palette-classic" }, |
| 265 | + "custom": { |
| 266 | + "drawStyle": "line", |
| 267 | + "fillOpacity": 10, |
| 268 | + "lineWidth": 1, |
| 269 | + "pointSize": 5, |
| 270 | + "showPoints": "never", |
| 271 | + "spanNulls": false |
| 272 | + } |
| 273 | + }, |
| 274 | + "overrides": [] |
| 275 | + }, |
| 276 | + "options": { |
| 277 | + "legend": { "displayMode": "table", "placement": "bottom", "showLegend": true }, |
| 278 | + "tooltip": { "mode": "multi", "sort": "desc" } |
| 279 | + } |
| 280 | +} |
| 281 | +``` |
| 282 | + |
| 283 | +**Common PromQL patterns:** |
| 284 | +``` |
| 285 | +rate(counter_total{job="my-job"}[1m]) # Rate of a counter |
| 286 | +summary_metric{job="my-job", quantile="$quantile"} # Summary quantile with variable |
| 287 | +sum(rate(counter{job="my-job"}[1m])) by (instance) # Aggregation |
| 288 | +``` |
| 289 | +
|
| 290 | +**Common units:** `ops` (operations/sec), `s` (seconds), `bytes`, `percent`, `short` (plain number) |
| 291 | +
|
| 292 | +--- |
| 293 | +
|
| 294 | +## Debugging Dashboards |
| 295 | +
|
| 296 | +### Dashboard not appearing in Grafana |
| 297 | +
|
| 298 | +1. **Check volume mount** — Is the dashboard registered in `41-grafana-deployment.yaml` with both a `volumeMount` and a `volume`? This is the most common cause. |
| 299 | +2. **Check deployment was applied** — `dashboards upload` reapplies the deployment automatically. If you applied the ConfigMap manually without the deployment, the new volume mount won't take effect. |
| 300 | +3. **Check ConfigMap name matches** — The `configMap.name` in the volume must exactly match the ConfigMap `metadata.name`. |
| 301 | +4. **Check filename pattern** — The YAML filename must contain `grafana-dashboard` for auto-discovery by `GrafanaDashboardService`. |
| 302 | +
|
| 303 | +### Dashboard appears but shows no data |
| 304 | +
|
| 305 | +1. **Check datasource** — Verify the `"uid"` in panel datasource matches an available datasource (see table above). |
| 306 | +2. **Check metric names** — Query VictoriaMetrics API: `curl http://<control-ip>:8428/api/v1/label/__name__/values` |
| 307 | +3. **Check job label** — Verify `{job="..."}` matches what OTel is scraping. Check the OTel collector config for the `job_name`. |
| 308 | +4. **Check scrape interval** — If a job runs shorter than the scrape interval, metrics may never be collected. |
| 309 | +
|
| 310 | +### JSON syntax errors |
| 311 | +
|
| 312 | +The dashboard JSON is embedded in a YAML `|` block. Common issues: |
| 313 | +- Missing comma after a JSON object/array |
| 314 | +- Mismatched braces/brackets |
| 315 | +- Wrong indentation (all JSON lines must be indented exactly 4 spaces in the YAML) |
| 316 | +
|
| 317 | +Run `dashboards generate` to extract the substituted YAML, then validate the JSON portion. |
0 commit comments