Skip to content

Commit dc48c09

Browse files
Fix dashboard upload to reapply Grafana deployment for new volume mounts
dashboards upload was only applying dashboard ConfigMaps but not the Grafana deployment, so new dashboards with volume mounts added after initial k8 apply would never appear. Now also extracts and applies the grafana-deployment resource. Adds stress dashboard volume mount to Grafana deployment and creates a local skill for dashboard work.
1 parent 5a25b67 commit dc48c09

File tree

4 files changed

+354
-5
lines changed

4 files changed

+354
-5
lines changed
Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
---
2+
name: grafana-dashboards
3+
description: Use when doing any work with Grafana dashboards - creating, modifying, debugging, or deploying. Covers the full architecture, file locations, naming conventions, and deployment pipeline.
4+
---
5+
6+
# Grafana Dashboards
7+
8+
## Architecture Overview
9+
10+
Dashboards are Grafana JSON embedded in K8s ConfigMap YAMLs, stored as classpath resources. They are extracted from the JAR, undergo template variable substitution, and are applied to K8s. Grafana loads them via file-based provisioning from mounted ConfigMap volumes.
11+
12+
```
13+
JAR resource YAML → TemplateService (substitutes __KEY__ vars) → kubectl apply → ConfigMap in K8s
14+
15+
Grafana deployment mounts ConfigMap as volume → Grafana provisioner scans /var/lib/grafana/dashboards/
16+
```
17+
18+
## Key Files and Locations
19+
20+
| What | Path |
21+
|------|------|
22+
| Core dashboard YAMLs | `src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/core/` |
23+
| ClickHouse dashboard YAMLs | `src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/clickhouse/` |
24+
| Grafana deployment | `src/main/resources/.../k8s/core/41-grafana-deployment.yaml` |
25+
| Dashboard provisioner config | `src/main/resources/.../k8s/core/14-grafana-dashboards-configmap.yaml` |
26+
| Dashboard service | `src/main/kotlin/.../services/GrafanaDashboardService.kt` |
27+
| Dashboard commands | `src/main/kotlin/.../commands/dashboards/` (`DashboardsGenerate.kt`, `DashboardsUpload.kt`) |
28+
| Template substitution | `src/main/kotlin/.../services/TemplateService.kt` |
29+
| Datasource config | `src/main/kotlin/.../grafana/GrafanaDatasourceConfig.kt` |
30+
| Service test | `src/test/kotlin/.../services/GrafanaDashboardServiceTest.kt` |
31+
32+
## Existing Dashboards
33+
34+
| File | ConfigMap Name | Datasource |
35+
|------|---------------|------------|
36+
| `15-grafana-dashboard-system.yaml` | `grafana-dashboard-system` | VictoriaMetrics (prometheus) |
37+
| `16-grafana-dashboard-s3.yaml` | `grafana-dashboard-s3` | CloudWatch |
38+
| `17-grafana-dashboard-emr.yaml` | `grafana-dashboard-emr` | CloudWatch |
39+
| `18-grafana-dashboard-opensearch.yaml` | `grafana-dashboard-opensearch` | CloudWatch |
40+
| `19-grafana-dashboard-stress.yaml` | `grafana-dashboard-stress` | VictoriaMetrics (prometheus) |
41+
| `14-grafana-dashboard-clickhouse.yaml` (clickhouse/) | `grafana-dashboard-clickhouse` | ClickHouse |
42+
| `17-grafana-dashboard-clickhouse-logs.yaml` (clickhouse/) | `grafana-dashboard-clickhouse-logs` | VictoriaLogs |
43+
44+
## Available Datasources
45+
46+
| Name | `type` value | `uid` value | Port |
47+
|------|-------------|-------------|------|
48+
| VictoriaMetrics | `prometheus` | `VictoriaMetrics` | 8428 |
49+
| VictoriaLogs | `victoriametrics-logs-datasource` | `victorialogs` | 9428 |
50+
| ClickHouse | `grafana-clickhouse-datasource` | (auto) | 9000 |
51+
| Tempo | `tempo` | `tempo` | 3200 |
52+
| CloudWatch | `cloudwatch` | `cloudwatch` | N/A |
53+
54+
Datasources are created at runtime by `GrafanaDatasourceConfig.create(region)` and applied as a ConfigMap by `GrafanaDashboardService.createDatasourcesConfigMap()`.
55+
56+
---
57+
58+
## Creating a New Dashboard
59+
60+
### Step 1: Create the ConfigMap YAML
61+
62+
**File naming:** `{ordinal}-grafana-dashboard-{name}.yaml`
63+
- Ordinal is two digits (next available after existing dashboards)
64+
- Name is kebab-case: `stress`, `emr`, `opensearch`
65+
- The filename MUST contain `grafana-dashboard` for auto-discovery
66+
67+
**ConfigMap naming:** `grafana-dashboard-{name}` (must match the volume reference in Step 2)
68+
69+
**Required label:** `app.kubernetes.io/name: grafana`
70+
71+
**Template:**
72+
```yaml
73+
apiVersion: v1
74+
kind: ConfigMap
75+
metadata:
76+
name: grafana-dashboard-{name}
77+
namespace: default
78+
labels:
79+
app.kubernetes.io/name: grafana
80+
data:
81+
{name}-overview.json: |
82+
{
83+
"annotations": { "list": [] },
84+
"editable": true,
85+
"fiscalYearStartMonth": 0,
86+
"graphTooltip": 1,
87+
"id": null,
88+
"links": [],
89+
"liveNow": false,
90+
"panels": [],
91+
"refresh": "5m",
92+
"schemaVersion": 38,
93+
"style": "dark",
94+
"tags": [],
95+
"templating": {
96+
"list": [
97+
{
98+
"current": { "selected": true, "text": "VictoriaMetrics", "value": "VictoriaMetrics" },
99+
"hide": 0,
100+
"includeAll": false,
101+
"label": "Datasource",
102+
"multi": false,
103+
"name": "datasource",
104+
"options": [],
105+
"query": "prometheus",
106+
"queryValue": "",
107+
"refresh": 1,
108+
"regex": "",
109+
"skipUrlSync": false,
110+
"type": "datasource"
111+
}
112+
]
113+
},
114+
"time": { "from": "now-1h", "to": "now" },
115+
"timepicker": {},
116+
"timezone": "browser",
117+
"title": "__CLUSTER_NAME__ - {Display Name}",
118+
"uid": "{name}-overview",
119+
"version": 1,
120+
"weekStart": ""
121+
}
122+
```
123+
124+
For CloudWatch dashboards, change `"query": "prometheus"` to `"query": "cloudwatch"` in the datasource variable.
125+
126+
### Step 2: Register in Grafana Deployment
127+
128+
**File:** `src/main/resources/.../k8s/core/41-grafana-deployment.yaml`
129+
130+
Add **both** a volumeMount and a volume. Use `optional: true` for non-core dashboards.
131+
132+
**volumeMount** (add after last dashboard mount, before `data`):
133+
```yaml
134+
- name: dashboard-{name}
135+
mountPath: /var/lib/grafana/dashboards/{name}
136+
readOnly: true
137+
```
138+
139+
**volume** (add after last dashboard volume, before `data`):
140+
```yaml
141+
- name: dashboard-{name}
142+
configMap:
143+
name: grafana-dashboard-{name}
144+
optional: true
145+
```
146+
147+
The volume `name` must match in both places. The `configMap.name` must match the ConfigMap metadata name from Step 1.
148+
149+
**CRITICAL:** Without this step, the dashboard will NOT appear in Grafana. Grafana does not auto-discover ConfigMaps. Each dashboard must be explicitly mounted.
150+
151+
### Step 3: Verify and Deploy
152+
153+
```bash
154+
./gradlew :test
155+
# On running cluster: dashboards upload
156+
```
157+
158+
---
159+
160+
## Modifying an Existing Dashboard
161+
162+
1. Edit the JSON inside the ConfigMap YAML directly
163+
2. Run `./gradlew :test` to verify compilation
164+
3. Deploy with `dashboards upload`
165+
166+
The dashboard JSON is indented 4 spaces inside the YAML `|` block. Be careful with indentation — YAML block scalars are whitespace-sensitive.
167+
168+
---
169+
170+
## Deployment Pipeline
171+
172+
### `dashboards upload` command
173+
174+
`GrafanaDashboardService.uploadDashboards()` does:
175+
1. Creates the Grafana datasources ConfigMap (with runtime AWS region)
176+
2. Extracts all classpath resources matching `"grafana-dashboard"` (dashboard ConfigMaps)
177+
3. Extracts the classpath resource matching `"grafana-deployment"` (Grafana Deployment)
178+
4. Applies all of them to K8s via `kubectl apply`
179+
180+
This means `dashboards upload` automatically reapplies the Grafana deployment, picking up any new volume mounts. No separate `k8 apply` is needed.
181+
182+
### `dashboards generate` command
183+
184+
Extracts dashboard YAMLs to the local `k8s/` directory with template substitution but does NOT apply them. Useful for inspecting the substituted output.
185+
186+
### `k8 apply` command
187+
188+
Applies ALL `core/` resources including the Grafana deployment and dashboards. Used during initial cluster setup.
189+
190+
---
191+
192+
## Template Variable Systems
193+
194+
There are two layers of template variables:
195+
196+
### Layer 1: Cluster Context (`__KEY__` syntax)
197+
198+
Replaced at extraction time by `TemplateService.buildContextVariables()`:
199+
200+
| Variable | Source |
201+
|----------|--------|
202+
| `__CLUSTER_NAME__` | `state.initConfig?.name ?: "cluster"` |
203+
| `__BUCKET_NAME__` | `state.s3Bucket ?: ""` |
204+
| `__AWS_REGION__` | `user.region` |
205+
| `__CONTROL_NODE_IP__` | `controlHost?.privateIp ?: ""` |
206+
| `__METRICS_FILTER_ID__` | Built from cluster state |
207+
| `__CLUSTER_S3_PREFIX__` | Built from cluster state |
208+
209+
### Layer 2: Grafana Variables (`${var}` syntax)
210+
211+
Defined in the dashboard JSON `templating.list` array. These create dropdowns in the Grafana UI.
212+
213+
**Custom dropdown example** (quantile selector):
214+
```json
215+
{
216+
"current": { "selected": true, "text": "p99", "value": "0.99" },
217+
"hide": 0,
218+
"includeAll": false,
219+
"label": "Quantile",
220+
"multi": false,
221+
"name": "quantile",
222+
"options": [
223+
{ "selected": false, "text": "p50", "value": "0.5" },
224+
{ "selected": true, "text": "p99", "value": "0.99" }
225+
],
226+
"query": "p50 : 0.5, p75 : 0.75, p95 : 0.95, p99 : 0.99",
227+
"skipUrlSync": false,
228+
"type": "custom"
229+
}
230+
```
231+
232+
Reference the variable in panel expressions as `$quantile` or `${quantile}`.
233+
234+
---
235+
236+
## Panel Patterns
237+
238+
**Grid layout:** `gridPos` uses a 24-column grid. `w: 12` = half width, `w: 24` = full width. `h: 8` is standard panel height. `y` increases downward.
239+
240+
**Row separator:**
241+
```json
242+
{ "collapsed": false, "gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 }, "id": 100, "title": "Section Name", "type": "row" }
243+
```
244+
245+
**VictoriaMetrics (Prometheus) timeseries panel:**
246+
```json
247+
{
248+
"type": "timeseries",
249+
"title": "Panel Title",
250+
"id": 1,
251+
"datasource": { "type": "prometheus", "uid": "${datasource}" },
252+
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 1 },
253+
"targets": [
254+
{
255+
"datasource": { "type": "prometheus", "uid": "${datasource}" },
256+
"expr": "rate(my_metric{job=\"my-job\"}[1m])",
257+
"legendFormat": "{{instance}}",
258+
"refId": "A"
259+
}
260+
],
261+
"fieldConfig": {
262+
"defaults": {
263+
"unit": "ops",
264+
"color": { "mode": "palette-classic" },
265+
"custom": {
266+
"drawStyle": "line",
267+
"fillOpacity": 10,
268+
"lineWidth": 1,
269+
"pointSize": 5,
270+
"showPoints": "never",
271+
"spanNulls": false
272+
}
273+
},
274+
"overrides": []
275+
},
276+
"options": {
277+
"legend": { "displayMode": "table", "placement": "bottom", "showLegend": true },
278+
"tooltip": { "mode": "multi", "sort": "desc" }
279+
}
280+
}
281+
```
282+
283+
**Common PromQL patterns:**
284+
```
285+
rate(counter_total{job="my-job"}[1m]) # Rate of a counter
286+
summary_metric{job="my-job", quantile="$quantile"} # Summary quantile with variable
287+
sum(rate(counter{job="my-job"}[1m])) by (instance) # Aggregation
288+
```
289+
290+
**Common units:** `ops` (operations/sec), `s` (seconds), `bytes`, `percent`, `short` (plain number)
291+
292+
---
293+
294+
## Debugging Dashboards
295+
296+
### Dashboard not appearing in Grafana
297+
298+
1. **Check volume mount** — Is the dashboard registered in `41-grafana-deployment.yaml` with both a `volumeMount` and a `volume`? This is the most common cause.
299+
2. **Check deployment was applied** — `dashboards upload` reapplies the deployment automatically. If you applied the ConfigMap manually without the deployment, the new volume mount won't take effect.
300+
3. **Check ConfigMap name matches** — The `configMap.name` in the volume must exactly match the ConfigMap `metadata.name`.
301+
4. **Check filename pattern** — The YAML filename must contain `grafana-dashboard` for auto-discovery by `GrafanaDashboardService`.
302+
303+
### Dashboard appears but shows no data
304+
305+
1. **Check datasource** — Verify the `"uid"` in panel datasource matches an available datasource (see table above).
306+
2. **Check metric names** — Query VictoriaMetrics API: `curl http://<control-ip>:8428/api/v1/label/__name__/values`
307+
3. **Check job label** — Verify `{job="..."}` matches what OTel is scraping. Check the OTel collector config for the `job_name`.
308+
4. **Check scrape interval** — If a job runs shorter than the scrape interval, metrics may never be collected.
309+
310+
### JSON syntax errors
311+
312+
The dashboard JSON is embedded in a YAML `|` block. Common issues:
313+
- Missing comma after a JSON object/array
314+
- Mismatched braces/brackets
315+
- Wrong indentation (all JSON lines must be indented exactly 4 spaces in the YAML)
316+
317+
Run `dashboards generate` to extract the substituted YAML, then validate the JSON portion.

src/main/kotlin/com/rustyrazorblade/easydblab/services/GrafanaDashboardService.kt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ class DefaultGrafanaDashboardService(
6060
) : GrafanaDashboardService {
6161
companion object {
6262
private const val DASHBOARD_FILE_PATTERN = "grafana-dashboard"
63+
private const val GRAFANA_DEPLOYMENT_PATTERN = "grafana-deployment"
6364
private const val DATASOURCES_CONFIGMAP_NAME = "grafana-datasources"
6465
private const val DEFAULT_NAMESPACE = "default"
6566
}
@@ -70,6 +71,12 @@ class DefaultGrafanaDashboardService(
7071
filter = { it.contains(DASHBOARD_FILE_PATTERN) },
7172
).sortedBy { it.name }
7273

74+
private fun extractGrafanaDeployment(): File? =
75+
templateService
76+
.extractAndSubstituteResources(
77+
filter = { it.contains(GRAFANA_DEPLOYMENT_PATTERN) },
78+
).firstOrNull()
79+
7380
override fun createDatasourcesConfigMap(
7481
controlHost: ClusterHost,
7582
region: String,
@@ -117,6 +124,19 @@ class DefaultGrafanaDashboardService(
117124
}
118125
}
119126

127+
// Reapply the Grafana deployment to pick up any new volume mounts for dashboards
128+
val deploymentFile = extractGrafanaDeployment()
129+
if (deploymentFile != null) {
130+
outputHandler.handleMessage("Applying ${deploymentFile.name}...")
131+
k8sService
132+
.applyManifests(controlHost, deploymentFile.toPath())
133+
.getOrElse { exception ->
134+
return Result.failure(
135+
IllegalStateException("Failed to apply ${deploymentFile.name}: ${exception.message}", exception),
136+
)
137+
}
138+
}
139+
120140
outputHandler.handleMessage("All Grafana dashboards applied successfully!")
121141
return Result.success(Unit)
122142
}

src/main/resources/com/rustyrazorblade/easydblab/commands/k8s/core/41-grafana-deployment.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ spec:
9191
- name: dashboard-opensearch
9292
mountPath: /var/lib/grafana/dashboards/opensearch
9393
readOnly: true
94+
- name: dashboard-stress
95+
mountPath: /var/lib/grafana/dashboards/stress
96+
readOnly: true
9497
- name: data
9598
mountPath: /var/lib/grafana
9699
livenessProbe:
@@ -135,5 +138,9 @@ spec:
135138
configMap:
136139
name: grafana-dashboard-opensearch
137140
optional: true
141+
- name: dashboard-stress
142+
configMap:
143+
name: grafana-dashboard-stress
144+
optional: true
138145
- name: data
139146
emptyDir: {}

0 commit comments

Comments
 (0)