Skip to content

Conversation

TheRealNoob
Copy link

https://github.com/kubernetes-monitoring/kubernetes-mixin has set the standard of templating the cluster variable exactly as this repo has done, only the two seem to define the variable differently. kubernetes-mixin defines it as a selector of multiple kubernetes clusters, whereas the existing config here seems to treat it as a selector for multiple instances of etcd. This approach doesn't work if you've installed etcd to each cluster using the same set of labels such as job.

My proposed change is to change the definition of cluster to follow the standard set by kubernetes-mixin, and add an additional variable job which acts as the instance selector. There are some challenges with this which I'll get into in the comments.

`kubernetes-monitoring/kubernetes-mixin` has set the standard of templating the `cluster` variable exactly as this repo have done, only the two seem to define the variable differently.  kubernetes-mixin defines it as a pointer to multiple kubernetes clusters, whereas the existing config here seems to treat it as a `job` selector.

Signed-off-by: TheRealNoob <[email protected]>
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TheRealNoob
Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @TheRealNoob. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TheRealNoob
Copy link
Author

So the challenge I'm facing and the reason I've opened this PR as a draft is because I can't figure out how to make it work without implementing a breaking change. I suspect the number of people it would impact would be quite small and would be an easy fix (depending on the path forward), but still a breaking change. I would like feedback from maintainers on how to proceed before I go any further.

Let me levelset with the two variables as I currently have them defined. cluster doesn't filter on anything and just grabs all values of the label. job filters on cluster and the config.etcd_selector from config.libsonnet.

  cluster:
    var.query.new('cluster')
    + var.query.generalOptions.withLabel('cluster')
    + var.query.withDatasourceFromVariable(self.datasource)
    + { refresh: config.dashboard_var_refresh }
    + var.query.queryTypes.withLabelValues(
      config.clusterLabel,
      'etcd_server_has_leader'
    ),

  job:
    var.query.new('job')
    + var.query.generalOptions.withLabel('job')
    + var.query.withDatasourceFromVariable(self.datasource)
    + { refresh: config.dashboard_var_refresh }
    + var.query.queryTypes.withLabelValues(
      'job',
      'etcd_server_has_leader{cluster="$cluster", %s}' % [config.etcd_selector]
    ),

The problem I'm having is with the etcd_selector. This job query does work, except that the label_values(job, ...) function hardcodes the variable job here. Yes by default the etcd_selector uses the label called job, but it doesn't necessarily have to - it can be set to something else, or could even be set to filter on multiple labels; {job="etcd", foo="bar", biz="baz"}. So if anyone changes this variable away from the label job the variable breaks.

kubernetes-monitoring doesn't run into this problem because it doesn't have an equivalent etcd_selector. It filters by cluster, and then the second label it calls instance, but it doesn't filter on anything in that query (except cluster). So the second drop-down menu instance can comfortably be hardcoded to the label instance.

My proposal would be to delete the etcd_selector. Given the way the dashboard is written now I think it's fine, even preferable, to stick with the job label instead of the instance label. Just don't allow users to filter this label down.

If we really want to keep this functionality, I can think of two options.

  1. we can configure the job variable to allow free-form text, and for this text to be input into a regex expression. So if a user typed .+etcd.+ into the menu, it would then be used in all panel queries as such etcd_metric{cluster="$cluster", job=~"$job"}.
  2. we could break this selector into two pieces; one for the label and other the label_value. so etcd_selector: 'job=~".*etcd.*"' becomes etcd_selector_label: 'job' and etcd_selector_labelvalue: '.+etcd.+'.
    Both options impose that the selector can no longer contain a filter on multiple labels, which is a drawback.

@TheRealNoob
Copy link
Author

Completely unrelated proposal....would it ever be considered to migrate the management of this dashboard into kubernetes-mixin? I'm sure it would be beneficial for it to receive standardization features beyond just this one problem I've run into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants