grafana_dashboard: update `cluster` variable to follow kubernetes-mixin #20449

TheRealNoob · 2025-08-07T05:43:22Z

https://github.com/kubernetes-monitoring/kubernetes-mixin has set the standard of templating the cluster variable exactly as this repo has done, only the two seem to define the variable differently. kubernetes-mixin defines it as a selector of multiple kubernetes clusters, whereas the existing config here seems to treat it as a selector for multiple instances of etcd. This approach doesn't work if you've installed etcd to each cluster using the same set of labels such as job.

My proposed change is to change the definition of cluster to follow the standard set by kubernetes-mixin, and add an additional variable job which acts as the instance selector. There are some challenges with this which I'll get into in the comments.

`kubernetes-monitoring/kubernetes-mixin` has set the standard of templating the `cluster` variable exactly as this repo have done, only the two seem to define the variable differently. kubernetes-mixin defines it as a pointer to multiple kubernetes clusters, whereas the existing config here seems to treat it as a `job` selector. Signed-off-by: TheRealNoob <[email protected]>

k8s-ci-robot · 2025-08-07T05:43:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TheRealNoob
Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-08-07T05:43:32Z

Hi @TheRealNoob. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

TheRealNoob · 2025-08-07T06:09:52Z

So the challenge I'm facing and the reason I've opened this PR as a draft is because I can't figure out how to make it work without implementing a breaking change. I suspect the number of people it would impact would be quite small and would be an easy fix (depending on the path forward), but still a breaking change. I would like feedback from maintainers on how to proceed before I go any further.

Let me levelset with the two variables as I currently have them defined. cluster doesn't filter on anything and just grabs all values of the label. job filters on cluster and the config.etcd_selector from config.libsonnet.

  cluster:
    var.query.new('cluster')
    + var.query.generalOptions.withLabel('cluster')
    + var.query.withDatasourceFromVariable(self.datasource)
    + { refresh: config.dashboard_var_refresh }
    + var.query.queryTypes.withLabelValues(
      config.clusterLabel,
      'etcd_server_has_leader'
    ),

  job:
    var.query.new('job')
    + var.query.generalOptions.withLabel('job')
    + var.query.withDatasourceFromVariable(self.datasource)
    + { refresh: config.dashboard_var_refresh }
    + var.query.queryTypes.withLabelValues(
      'job',
      'etcd_server_has_leader{cluster="$cluster", %s}' % [config.etcd_selector]
    ),

The problem I'm having is with the etcd_selector. This job query does work, except that the label_values(job, ...) function hardcodes the variable job here. Yes by default the etcd_selector uses the label called job, but it doesn't necessarily have to - it can be set to something else, or could even be set to filter on multiple labels; {job="etcd", foo="bar", biz="baz"}. So if anyone changes this variable away from the label job the variable breaks.

kubernetes-monitoring doesn't run into this problem because it doesn't have an equivalent etcd_selector. It filters by cluster, and then the second label it calls instance, but it doesn't filter on anything in that query (except cluster). So the second drop-down menu instance can comfortably be hardcoded to the label instance.

My proposal would be to delete the etcd_selector. Given the way the dashboard is written now I think it's fine, even preferable, to stick with the job label instead of the instance label. Just don't allow users to filter this label down.

If we really want to keep this functionality, I can think of two options.

we can configure the job variable to allow free-form text, and for this text to be input into a regex expression. So if a user typed .+etcd.+ into the menu, it would then be used in all panel queries as such etcd_metric{cluster="$cluster", job=~"$job"}.
we could break this selector into two pieces; one for the label and other the label_value. so etcd_selector: 'job=~".*etcd.*"' becomes etcd_selector_label: 'job' and etcd_selector_labelvalue: '.+etcd.+'.
Both options impose that the selector can no longer contain a filter on multiple labels, which is a drawback.

TheRealNoob · 2025-08-07T06:11:54Z

Completely unrelated proposal....would it ever be considered to migrate the management of this dashboard into kubernetes-mixin? I'm sure it would be beneficial for it to receive standardization features beyond just this one problem I've run into.

k8s-ci-robot added do-not-merge/work-in-progress area/contrib labels Aug 7, 2025

k8s-ci-robot added area/observability needs-ok-to-test labels Aug 7, 2025

k8s-ci-robot added the size/M label Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

grafana_dashboard: update `cluster` variable to follow kubernetes-mixin #20449

grafana_dashboard: update `cluster` variable to follow kubernetes-mixin #20449

TheRealNoob commented Aug 7, 2025

Uh oh!

k8s-ci-robot commented Aug 7, 2025

Uh oh!

k8s-ci-robot commented Aug 7, 2025

Uh oh!

TheRealNoob commented Aug 7, 2025

Uh oh!

TheRealNoob commented Aug 7, 2025

Uh oh!

Uh oh!

grafana_dashboard: update cluster variable to follow kubernetes-mixin #20449

Are you sure you want to change the base?

grafana_dashboard: update cluster variable to follow kubernetes-mixin #20449

Conversation

TheRealNoob commented Aug 7, 2025

Uh oh!

k8s-ci-robot commented Aug 7, 2025

Uh oh!

k8s-ci-robot commented Aug 7, 2025

Uh oh!

TheRealNoob commented Aug 7, 2025

Uh oh!

TheRealNoob commented Aug 7, 2025

Uh oh!

Uh oh!

grafana_dashboard: update `cluster` variable to follow kubernetes-mixin #20449

grafana_dashboard: update `cluster` variable to follow kubernetes-mixin #20449