-
Notifications
You must be signed in to change notification settings - Fork 15.4k
feature-blog(ccm): new metric route_sync_total #54693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-1.36
Are you sure you want to change the base?
Changes from all commits
c68f79c
d9b5a74
c8399ef
9d9861f
38e0d9c
a48f0cc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| --- | ||
| layout: blog | ||
| title: "Kubernetes v1.36: New Metric for Route Sync in the Cloud Controller Manager" | ||
| date: 2026-02-26 | ||
| slug: ccm-new-metric-route-sync-total | ||
| author: > | ||
| [Lukas Metzner](https://github.com/lukasmetzner) (Hetzner) | ||
| --- | ||
|
|
||
| Kubernetes v1.36 introduces a new alpha counter metric `route_controller_route_sync_total` | ||
| to the Cloud Controller Manager (CCM) route controller implementation at | ||
| [`k8s.io/cloud-provider`](https://github.com/kubernetes/cloud-provider). This metric | ||
| increments each time routes are synced with the cloud provider. | ||
|
|
||
| ## A/B testing watch-based route reconciliation | ||
|
|
||
| This metric was added to help operators validate the | ||
| `CloudControllerManagerWatchBasedRoutesReconciliation` feature gate introduced in | ||
| [Kubernetes v1.35](/blog/2025/12/30/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/). | ||
| That feature gate switches the route controller from a fixed-interval loop to a watch-based | ||
| approach that only reconciles when nodes actually change. This reduces unnecessary API calls | ||
| to the infrastructure provider, lowering pressure on rate-limited APIs and allowing operators | ||
| to make more efficient use of their available quota. | ||
|
|
||
| To A/B test this, compare `route_controller_route_sync_total` with the feature gate | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be good to have an example of what the metric may look like and how to query and poke it in a running cluster? What's the rate of change usually with and without this feature enabled? This seems like the metric should stay steady, maybe show that when we have the feature disabled it increments steadily and on the other side the metric should stay still until we update the routes.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @michaelasp I have added a small example outlying the expected behavior of the metric with the feature gate enabled and disabled. As this metric is part of the |
||
| disabled (default) versus enabled. In clusters where node changes are infrequent, you should | ||
| see a significant drop in the sync rate with the feature gate turned on. | ||
|
|
||
| ### Example: expected behavior | ||
|
|
||
| **With the feature gate disabled** (the default fixed-interval loop), the counter increments | ||
| steadily regardless of whether any node changes occurred: | ||
|
|
||
| ``` | ||
| # After 10 minutes with no node changes | ||
| route_controller_route_sync_total 60 | ||
| # After 20 minutes, still no node changes | ||
| route_controller_route_sync_total 120 | ||
| ``` | ||
|
|
||
| **With the feature gate enabled** (watch-based reconciliation), the counter only increments | ||
| when nodes are actually added, removed, or updated: | ||
|
|
||
| ``` | ||
| # After 10 minutes with no node changes | ||
| route_controller_route_sync_total 1 | ||
| # After 20 minutes, still no node changes — counter unchanged | ||
| route_controller_route_sync_total 1 | ||
| # A new node joins the cluster — counter increments | ||
| route_controller_route_sync_total 2 | ||
| ``` | ||
|
|
||
| The difference is especially visible in stable clusters where nodes rarely change. | ||
|
|
||
| ## Where can I give feedback? | ||
|
|
||
| If you have feedback, feel free to reach out through any of the following channels: | ||
| - The [#sig-cloud-provider](https://kubernetes.slack.com/messages/sig-cloud-provider) channel on [Kubernetes Slack](https://slack.k8s.io/) | ||
| - The [KEP-5237 issue](https://kep.k8s.io/5237) on GitHub | ||
lukasmetzner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - The [SIG Cloud Provider community page](https://github.com/kubernetes/community/tree/05223ecbd2d6f960edb40684dc83d053d49f8b68/sig-cloud-provider) for other communication channels | ||
|
|
||
| ## How can I learn more? | ||
|
|
||
| For more details, refer to [KEP-5237](https://kep.k8s.io/5237). | ||
Uh oh!
There was an error while loading. Please reload this page.