Skip to content

Conversation

@chi-quita-a
Copy link
Contributor

@chi-quita-a chi-quita-a commented Oct 30, 2025

Warning

This is a public repository. Ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • confidential business information.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

What does this PR do / why do we need this PR?

This PR adds a curated selection of Kubernetes dashboards from the open-source project dotdc/grafana-dashboards-kubernetes to improve the operator experience in Welkin’s Grafana setup.

These dashboards provide a clear, drill-down structure for day-to-day cluster analysis (Global → Namespaces → Nodes → Pods) and introduce additional monitoring surfaces for Prometheus and Trivy Operator. They complement the existing dashboards from kube-prometheus-stack / kubernetes-mixin without duplicating their functionality.

Because the chart already auto-discovers dashboards placed under dashboards/**, no configuration changes were required. The dashboards are provisioned automatically via the existing ConfigMap template and Grafana sidecar.

Changes made

Added the following dashboards under
helmfile.d/charts/grafana-dashboards/dashboards/:

  1. k8s-views-global-dashboard.json
  2. k8s-views-namespaces-dashboard.json
  3. k8s-views-nodes-dashboard.json
  4. k8s-views-pods-dashboard.json
  5. k8s-addons-prometheus-dashboard.json
  6. k8s-addons-trivy-operator-dashboard.json

Deliberate exclusions

The following dashboards were intentionally not added:

  • k8s-system-api-server-dashboard.json
  • k8s-system-coredns-dashboard.json

These duplicate the canonical dashboards already provided by kube-prometheus-stack and would provide no additional value.

Validation

All added dashboards have been:

  • rendered via helmfile template,
  • applied to a local dev cluster, and
  • visually validated in Grafana to ensure they load correctly and work with our Prometheus metrics.

Existing dashboards (API server, CoreDNS, mixin dashboards) continue to function without duplication.

Information for reviewers

To reproduce the verification steps:

helmfile -e service_cluster -f helmfile.d -l name=grafana-dashboards template
helmfile -e service_cluster -f helmfile.d -l name=grafana-dashboards sync

Then check in Grafana:

  • Kubernetes / Views / Global
  • Kubernetes / Views / Namespaces
  • Kubernetes / Views / Nodes
  • Kubernetes / Views / Pods
  • Addons: Prometheus
  • Addons: Trivy Operator

Existing API server and CoreDNS dashboards should remain available.


Checklist

  • Proper commit message prefix
  • Transparent change
  • No alterations to existing configs
  • Dashboards auto-provisioned as intended
  • Metrics unaffected
  • Public documentation required no update

@chi-quita-a chi-quita-a requested a review from a team as a code owner October 30, 2025 11:31
@chi-quita-a chi-quita-a added the kind/feature New feature or request label Oct 30, 2025
@chi-quita-a chi-quita-a linked an issue Oct 30, 2025 that may be closed by this pull request
2 tasks
@chi-quita-a chi-quita-a force-pushed the 2396-try-out-and-add-some-new-kubernetes-dashboards branch from c464b49 to 5a5fc4b Compare October 30, 2025 12:14
@AlbinB97
Copy link
Contributor

AlbinB97 commented Oct 30, 2025

I think this can be a kind/other instead of kind/feature, since it's not a new feature added to Welkin, just an addition to an existing feature (Grafana dashboards). You could alternatively put it as kind/improvement as well I suppose ☺️

Dashboards are automatically provisioned via the existing ConfigMap template that uses .Files.Glob "dashboards/**dashboard.json"<link to Repo elastisys/compliantkubernetes-apps: helmfile.d/charts/grafana-dashboards/templates/configmap-dashboards.yaml:1-2 />

What's going on here? 😂

Next steps:

  • Test dashboards in Grafana UI to verify they work with our Prometheus metrics
  • Evaluate which dashboards provide value vs. overlap with existing ones
  • Make adjustments to dashboard queries if needed for compatibility

These should all be a part of the task and not next steps after this is merged. We should not merge something that hasn't been evaluated, and most importantly, tested.

@Zash
Copy link
Contributor

Zash commented Oct 30, 2025

The 'Cluster' dropdown doesn't seem to work on my kind cluster, does it work in Real Clusters™ ?

@chi-quita-a chi-quita-a marked this pull request as draft October 30, 2025 14:25
@chi-quita-a
Copy link
Contributor Author

@Zash I'm going to take a deeper dive. :)

@chi-quita-a chi-quita-a force-pushed the 2396-try-out-and-add-some-new-kubernetes-dashboards branch from 5a5fc4b to 1a9bfd9 Compare November 13, 2025 12:53
@chi-quita-a chi-quita-a added kind/improvement Improvement of existing features, e.g. code cleanup or optimizations. and removed kind/feature New feature or request labels Nov 24, 2025
@chi-quita-a chi-quita-a force-pushed the 2396-try-out-and-add-some-new-kubernetes-dashboards branch from 15508d5 to cfdff11 Compare November 24, 2025 13:19
@chi-quita-a chi-quita-a marked this pull request as ready for review November 24, 2025 13:23
Copy link
Contributor

@elastisys-staffan elastisys-staffan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! These dashboards look gorgeous and will be a fine addition to our monitoring. Two small suggestions:

  • Add the source for the dashboards to helmfile.d/charts/grafana-dashboards/dashboards/README.md
  • It's a little hard to find the new dashboards in the list in the Grafana GUI. Maybe add some common tag to them?

@chi-quita-a
Copy link
Contributor Author

Nice work! These dashboards look gorgeous and will be a fine addition to our monitoring. Two small suggestions:

  • Add the source for the dashboards to helmfile.d/charts/grafana-dashboards/dashboards/README.md
  • It's a little hard to find the new dashboards in the list in the Grafana GUI. Maybe add some common tag to them?

@elastisys-staffan done and done! :)

Thank you for the input and guidance, very much appreciated.

@chi-quita-a chi-quita-a force-pushed the 2396-try-out-and-add-some-new-kubernetes-dashboards branch from 4ef8b3f to 94a9522 Compare November 25, 2025 13:59
Copy link
Contributor

@rarescosma rarescosma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: English

@elastisys-staffan elastisys-staffan self-requested a review November 25, 2025 15:49
Copy link
Contributor

@elastisys-staffan elastisys-staffan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@chi-quita-a
Copy link
Contributor Author

chi-quita-a commented Nov 26, 2025

Are the @elastisys/goto-monitoring-stack folks happy with the changes? If so, I'll merge.

Copy link
Contributor

@Xartos Xartos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the dashboards and they look great! I really like the look and feel of them.

Have some questions though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we still need the "Trivy Operator Dashboard" dashboard? Seems like this one is presenting the same information but in an argumently better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are truly equivalent but one is better, I guess the only harm in removing the older one would be breaking links, bookmarks, browser history search. Can Grafana do redirects? Or can we easily reuse the same ID and replace the old one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking links, bookmarks, browser history search.

..and most likely the E2E suite as well 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Same here, would we want to remove some of the other "Prometheus - ..." dashboards or do we see value in having both?

@chi-quita-a chi-quita-a force-pushed the 2396-try-out-and-add-some-new-kubernetes-dashboards branch from b399807 to 3626f92 Compare November 27, 2025 12:21
Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new k8s-views dashboards looks really nice and are a great addition. 🚀

But regarding the trivy-operator and prometheus ones, I am unsure if we should add them, or if we should instead replace the old ones we have.
Personally I think the new Prometheus dashboard is missing e.g. time series per job, and also it includes CPU and memory usage for all pods, which I do not think fits in that dashboard?

The new trivy dashboards looks a lot better than the old one, however, they show different numbers for number of vulnerabilities per severity as the queries are a bit different, in our old dashboard it sums for unique images, while in the new one it will sum for each pod so it will show a lot more vulnerabilities. I think I prefer that we calculate vulnerabilities per image as in the old one, but I can see both cases being valid.

@chi-quita-a
Copy link
Contributor Author

I think the new k8s-views dashboards looks really nice and are a great addition. 🚀

But regarding the trivy-operator and prometheus ones, I am unsure if we should add them, or if we should instead replace the old ones we have. Personally I think the new Prometheus dashboard is missing e.g. time series per job, and also it includes CPU and memory usage for all pods, which I do not think fits in that dashboard?

The new trivy dashboards looks a lot better than the old one, however, they show different numbers for number of vulnerabilities per severity as the queries are a bit different, in our old dashboard it sums for unique images, while in the new one it will sum for each pod so it will show a lot more vulnerabilities. I think I prefer that we calculate vulnerabilities per image as in the old one, but I can see both cases being valid.

@viktor-f @anders-elastisys , So shall we to a trial of these until next sprint, to decide what to keep and not keep?

@chi-quita-a chi-quita-a merged commit fe94761 into main Dec 4, 2025
12 checks passed
@chi-quita-a chi-quita-a deleted the 2396-try-out-and-add-some-new-kubernetes-dashboards branch December 4, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Improvement of existing features, e.g. code cleanup or optimizations.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Try out and add some new kubernetes dashboards

8 participants