Add multi-cluster dashboards: cluster variable, ad hoc filters, and cluster comparison dashboard#604
Add multi-cluster dashboards: cluster variable, ad hoc filters, and cluster comparison dashboard#604rustyrazorblade wants to merge 2 commits intomainfrom
Conversation
…luster comparison dashboard - Add cluster multi-select variable and adhocfilters to all 11 dashboards - Fix adhocfilters to include applyMode and baseFilters required by Grafana 10+ - Add VictoriaMetrics httpMethod=POST for label API compatibility - Use name-UUID format for cluster metric label to ensure uniqueness across clusters - Add new cluster-comparison.json dashboard with 8 sections showcasing all panel types: Polystat, Table, Time Series, Bar Chart, Status History, Heatmap, Percentile Ladder - Add stress job section showing client-observed throughput and latency per cluster - Register GrafanaDashboard.CLUSTER_COMPARISON as optional enum entry
PR Review: Multi-cluster dashboardsThe bulk of this PR (~6000 lines) is JSON reformatting of existing dashboards from compact to pretty-printed format — functional no-ops. The substantive changes are: new clusterLabelName() extension, new cluster comparison dashboard, ClickHouse datasource removal, VictoriaMetrics httpMethod:POST, and cluster variable support in ClickHouse dashboards. ClickHouse datasource removal: The GrafanaDatasource entry for ClickHouse is removed from GrafanaDatasourceConfig, but GrafanaDashboard.CLICKHOUSE and GrafanaDashboard.CLICKHOUSE_LOGS remain registered. ClickHouse dashboards will now have no provisioned datasource after grafana update-config runs. I understand the old config was broken (hardcoded db0 hostname), but removing without a replacement leaves ClickHouse users with broken dashboards. Was this intentional? If ClickHouse datasource configuration is being deferred to a separate mechanism, that should be documented. SetupInstance cluster name change: This changes the CLUSTER_NAME env var written to stress instances from just the cluster name to {name}-{clusterId}. This makes the label consistent with what OTel collector assigns, which is the right call for multi-cluster support. Worth calling out explicitly that re-running setup-instance on an existing stress node will change its reported cluster label, which could affect Grafana filter continuity mid-run. OpenSpec change not archived: openspec/changes/cluster-comparison-dashboard/ is present but not archived. If the implementation is complete, it should be moved to archive/ as part of this PR. httpMethod: POST for VictoriaMetrics: Good change — avoids URL length limits on large PromQL queries. Cluster comparison dashboard: Well-structured. histogram_quantile with sum by (le, cluster) is mathematically valid for per-cluster aggregation. The seven-section layout covering Polystat, Time Series, Bar Chart, Status History, Heatmap, and Percentile Ladder is a solid reference implementation. All panels correctly use cluster=~"$cluster" regex-match to support multi-select. The cluster2 variable for side-by-side heatmaps is a nice touch. JSON reformatting: The pretty-printing changes inflate this diff significantly and make the real changes hard to find. Consider separating these into a dedicated commit in future PRs. |
…on to Cassandra Cluster Comparison
PR ReviewOverall: Good multi-cluster foundation. The Kotlin changes are minimal and clean. OpenSpec:
|
No description provided.