Skip to content

feat: Continue running kube-state-metrics when config file doesnt exist at startup #2703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rashmichandrashekar
Copy link
Contributor

@rashmichandrashekar rashmichandrashekar commented Jul 8, 2025

What this PR does / why we need it:
This PR enables k-s-m to run when config file doesnt exist at startup.
Currently, k-s-m expects the file provided in the config parameter to exist at startup and if not, errors out. We are planning to take dependency on the config file by mounting it as an optional configmap which users can use to optionally override k-s-m configuration.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Does not change cardinality

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2700

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 8, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 8, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @rashmichandrashekar!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 8, 2025
@rashmichandrashekar rashmichandrashekar changed the title Continue running kube-state-metrics when config file doesnt exist at startup feat: Continue running kube-state-metrics when config file doesnt exist at startup Jul 8, 2025
@dgrisonnet
Copy link
Member

/assign @rexagod
/cc
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rashmichandrashekar
Once this PR has been reviewed and has the lgtm label, please ask for approval from rexagod. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rashmichandrashekar
Copy link
Contributor Author

Gentle ping to all reviewers.
@dgrisonnet @logicalhan @mrueg @rexagod - Please take a look and provide your feedback, thanks!

@rashmichandrashekar
Copy link
Contributor Author

@rexagod - Could you pls point me to the where the metrics need to be added? Thanks!

@rexagod
Copy link
Member

rexagod commented Jul 17, 2025

I'll have to second other maintainers as well, failing on configuration absence is the behaviour we keep for CRS at the moment too.

[kube-state-metrics] go run . --custom-resource-state-config-file="x.yaml"                                                                  main
E0717 23:11:29.242328    1152 wrapper.go:84] "Error reading Custom resource configuration file" err="open x.yaml: no such file or directory" file="x.yaml"
exit status 1

@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Jul 17, 2025

I'll have to second other maintainers as well, failing on configuration absence is the behaviour we keep for CRS at the moment too.

[kube-state-metrics] go run . --custom-resource-state-config-file="x.yaml"                                                                  main
E0717 23:11:29.242328    1152 wrapper.go:84] "Error reading Custom resource configuration file" err="open x.yaml: no such file or directory" file="x.yaml"
exit status 1

I think there is some confusion. The comments in the earlier section is to fail in case of invalid config. I don’t think there are concerns about continuing to run in case of absent config. We could update the behavior for both CRs and config file if needed in case of absent config files and make them backward compatible with a flag. @rexagod - pls let me know what you think.

@rashmichandrashekar
Copy link
Contributor Author

Ping on this PR. Maintainers @richabanker @rexagod @mrueg @CatherineF-dev - pls et me know the next steps and changes needed to proceed.

@richabanker
Copy link
Contributor

@mrueg @rexagod any other outstanding concerns here?

@richabanker
Copy link
Contributor

richabanker commented Jul 25, 2025

I guess my last thoughts on this are that with this change, ksm will continue to run even if the config file is missing, which is a substantial shift in user experience (users who rely on the application's exit code to validate their deployments may find their checks no longer work as expected) and should be called out in the release notes if we do decide to go forward with this.

@rashmichandrashekar
Copy link
Contributor Author

I guess my last thoughts on this are that with this change, ksm will continue to run even if the config file is missing, which is a substantial shift in user experience (users who rely on the application's exit code to validate their deployments may find their checks no longer work as expected) and should be called out in the release notes if we do decide to go forward with this.

Thanks @richabanker. That makes sense. If it helps with backward compatibility I could also make it configurable with a cli parameter and the new behavior of continuing to run, could be set with this so that there is no breaking change. Please let me know your thoughts. Also tagging @rexagod.

@rashmichandrashekar
Copy link
Contributor Author

I guess my last thoughts on this are that with this change, ksm will continue to run even if the config file is missing, which is a substantial shift in user experience (users who rely on the application's exit code to validate their deployments may find their checks no longer work as expected) and should be called out in the release notes if we do decide to go forward with this.

Thanks @richabanker. That makes sense. If it helps with backward compatibility I could also make it configurable with a cli parameter and the new behavior of continuing to run, could be set with this so that there is no breaking change. Please let me know your thoughts. Also tagging @rexagod.

Ping on this. @richabanker - pls let me know if we want to make this backward compatible with a parameter or just documenting with the release is fine. Thanks!

@richabanker
Copy link
Contributor

I guess my last thoughts on this are that with this change, ksm will continue to run even if the config file is missing, which is a substantial shift in user experience (users who rely on the application's exit code to validate their deployments may find their checks no longer work as expected) and should be called out in the release notes if we do decide to go forward with this.

Thanks @richabanker. That makes sense. If it helps with backward compatibility I could also make it configurable with a cli parameter and the new behavior of continuing to run, could be set with this so that there is no breaking change. Please let me know your thoughts. Also tagging @rexagod.

Ping on this. @richabanker - pls let me know if we want to make this backward compatible with a parameter or just documenting with the release is fine. Thanks!

Ideally expecting a response from the project leads @rexagod or @mrueg , do you have an opinion here? IS this change as proposed breaking?

@rexagod
Copy link
Member

rexagod commented Jul 29, 2025

If it helps with backward compatibility I could also make it configurable with a cli parameter and the new behavior of continuing to run, could be set with this so that there is no breaking change.

I'm sorry, but why do we need this exactly? Introducing a breaking feature and putting that behind a flag, defaulting it to off, can set an unwanted precedent for the project.

I don't mean to block you here, but I'm not sure we can introduce this unless there's a general consensus that the addition goes beyond downstream applications and helps the larger community.

@rashmichandrashekar
Copy link
Contributor Author

If it helps with backward compatibility I could also make it configurable with a cli parameter and the new behavior of continuing to run, could be set with this so that there is no breaking change.

I'm sorry, but why do we need this exactly? Introducing a breaking feature and putting that behind a flag, defaulting it to off, can set an unwanted precedent for the project. I don't mean to block you here, but we cannot introduce something like this unless there's a general consensus that the addition goes beyond downstream applications and helps the larger community.

The scenario would be that the config file could be provided at any time and not required during startup, for example having users configure this as a configmap mounted to the container after the container has started running and it can pick up changes dynamically which for the most part works today except that it expects it at startup. In a managed component scenario, this provides the flexibility for users to configure it after it is deployed.

@rexagod
Copy link
Member

rexagod commented Jul 29, 2025

Given how we have live-reloading for both KSM and CRS configurations, I can see the two complementing each other. However, wouldn't you need to do the same for CRS configuration as well (to deliver similar UX)? Would it make sense to have flags for both of these with the flag descriptions also highlighting the use-case for their inception?

@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Jul 29, 2025

Given how we have live-reloading for both KSM and CRS configurations, I can see the two complementing each other. However, wouldn't you need to do the same for CRS configuration as well (to deliver similar UX)? Would it make sense to have flags for both of these with the flag descriptions also highlighting the use-case for their inception?

Yes. Currently we dont support CR configuration. So, I didnt explore that. But I can make the change consistent across the two. I can add comments about the use cases for the config. So, just double checking before I implement - suggestion is to add a config flag that enables this behavior to provide for backward compatibility and make it consistent across KSM and CR configuration with comments. @rexagod - could you pls confirm if this sounds good?

@rexagod
Copy link
Member

rexagod commented Jul 29, 2025

Sounds good.

To reiterate a bit, it'd be nice to have a condensed version of the scenario you went over in your previous comment, in that flag(s) definition (here) as well, so other users know exactly when to use it (and thus deduce any similar use cases).

@k8s-ci-robot
Copy link
Contributor

@rashmichandrashekar: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rashmichandrashekar
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@rashmichandrashekar: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mrueg
Copy link
Member

mrueg commented Aug 5, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 5, 2025
@rashmichandrashekar
Copy link
Contributor Author

/ok-to-test

1 similar comment
@rashmichandrashekar
Copy link
Contributor Author

/ok-to-test

@rashmichandrashekar
Copy link
Contributor Author

rashmichandrashekar commented Aug 6, 2025

@rexagod - Thanks for all the feedback. Addressed all the comments, pls take a look.

@rashmichandrashekar
Copy link
Contributor Author

/ok-to-test

@richabanker
Copy link
Contributor

/ok-to-test

@rashmichandrashekar rashmichandrashekar force-pushed the rashmi/ksm-cfg branch 4 times, most recently from 4a74074 to 6c7f0ea Compare August 6, 2025 02:27
@rashmichandrashekar
Copy link
Contributor Author

Could you please squash the commits here as well? Thanks!

Done.

@rexagod
Copy link
Member

rexagod commented Aug 6, 2025

I believe https://github.com/kubernetes/kube-state-metrics/pull/2703/files#r2251349420 is left to address.

Also, you'll need to rebase, since the other PR is in now.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 6, 2025
@rashmichandrashekar
Copy link
Contributor Author

I believe https://github.com/kubernetes/kube-state-metrics/pull/2703/files#r2251349420 is left to address.

Also, you'll need to rebase, since the other PR is in now.

@rexagod - I think all of this should be addressed now, pls take a look. I did add logs to surface all errors (since I had done it only for one earlier).

@rashmichandrashekar rashmichandrashekar force-pushed the rashmi/ksm-cfg branch 2 times, most recently from 51f7380 to 9ceb49f Compare August 6, 2025 22:00
Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit regarding the error messages to reflect your last changes in the if-else conditionals. I believe we should be good to go after this.

@rashmichandrashekar
Copy link
Contributor Author

One final nit regarding the error messages to reflect your last changes in the if-else conditionals. I believe we should be good to go after this.

@rexagod - Sure, made the changes, please check. Thanks!

@rashmichandrashekar
Copy link
Contributor Author

/ok-to-test

@rashmichandrashekar
Copy link
Contributor Author

One final nit regarding the error messages to reflect your last changes in the if-else conditionals. I believe we should be good to go after this.

@rexagod - Sure, made the changes, please check. Thanks!

@rexagod - Gentle ping, please let me know if this is good to merge or if there is my other feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-state-metrics should support running if config file doesn't exist at startup
8 participants