Skip to content

Commit 2d53973

Browse files
authored
Merge pull request #49612 from elmiko/add-ccm-chicken-egg-blog
add blog post 2025-02-14 on cloud controller managers and the chicken and egg problem
2 parents 9500093 + 2f59dcf commit 2d53973

File tree

1 file changed

+249
-0
lines changed

1 file changed

+249
-0
lines changed
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
---
2+
layout: blog
3+
title: "The Cloud Controller Manager Chicken and Egg Problem"
4+
date: 2025-02-14
5+
slug: cloud-controller-manager-chicken-egg-problem
6+
author: >
7+
Antonio Ojea,
8+
Michael McCune
9+
---
10+
11+
Kubernetes 1.31
12+
[completed the largest migration in Kubernetes history][migration-blog], removing the in-tree
13+
cloud provider. While the component migration is now done, this leaves some additional
14+
complexity for users and installer projects (for example, kOps or Cluster API) . We will go
15+
over those additional steps and failure points and make recommendations for cluster owners.
16+
This migration was complex and some logic had to be extracted from the core components,
17+
building four new subsystems.
18+
19+
1. **Cloud controller manager** ([KEP-2392][kep2392])
20+
2. **API server network proxy** ([KEP-1281][kep1281])
21+
3. **kubelet credential provider plugins** ([KEP-2133][kep2133])
22+
4. **Storage migration to use [CSI][csi]** ([KEP-625][kep625])
23+
24+
The [cloud controller manager is part of the control plane][ccm]. It is a critical component
25+
that replaces some functionality that existed previously in the kube-controller-manager and the
26+
kubelet.
27+
28+
![Components of Kubernetes](https://kubernetes.io/images/docs/components-of-kubernetes.svg)
29+
30+
One of the most critical functionalities of the cloud controller manager is the node controller,
31+
which is responsible for the initialization of the nodes.
32+
33+
As you can see in the following diagram, when the **kubelet** starts, it registers the `Node`
34+
object with the apiserver, Tainting the node so it can be processed first by the
35+
cloud-controller-manager. The initial `Node` is missing the cloud-provider specific information,
36+
like the Node Addresses and the Labels with the cloud provider specific information like the
37+
Node, Region and Instance type information.
38+
39+
```mermaid
40+
sequenceDiagram
41+
autonumber
42+
rect rgb(191, 223, 255)
43+
Kubelet->>+Kube-apiserver: Create Node
44+
Note over Kubelet: Taint:<br/> node.cloudprovider.kubernetes.io
45+
Kube-apiserver->>-Kubelet: Node Created
46+
end
47+
Note over Kube-apiserver: Node is Not Ready<br/> Tainted, Missing Node Addresses*, ...
48+
Note over Kube-apiserver: Send Updates
49+
rect rgb(200, 150, 255)
50+
Kube-apiserver->>+Cloud-controller-manager: Watch: New Node Created
51+
Note over Cloud-controller-manager: Initialize Node:<br/>Cloud Provider Labels, Node Addresses, ...
52+
Cloud-controller-manager->>-Kube-apiserver: Update Node
53+
end
54+
Note over Kube-apiserver: Node is Ready
55+
```
56+
57+
This new initialization process adds some latency to the node readiness. Previously, the kubelet
58+
was able to initialize the node at the same time it created the node. Since the logic has moved
59+
to the cloud-controller-manager, this can cause a [chicken and egg problem][chicken-and-egg]
60+
during the cluster bootstrapping for those Kubernetes architectures that do not deploy the
61+
controller manager as the other components of the control plane, commonly as static pods,
62+
standalone binaries or daemonsets/deployments with tolerations to the taints and using
63+
`hostNetwork` (more on this below)
64+
65+
## Examples of the dependency problem
66+
67+
As noted above, it is possible during bootstrapping for the cloud-controller-manager to be
68+
unschedulable and as such the cluster will not initialize properly. The following are a few
69+
concrete examples of how this problem can be expressed and the root causes for why they might
70+
occur.
71+
72+
These examples assume you are running your cloud-controller-manager using a Kubernetes resource
73+
(e.g. Deployment, DaemonSet, or similar) to control its lifecycle. Because these methods
74+
rely on Kubernetes to schedule the cloud-controller-manager, care must be taken to ensure it
75+
will schedule properly.
76+
77+
### Example: Cloud controller manager not scheduling due to uninitialized taint
78+
79+
As [noted in the Kubernetes documentation][kubedocs0], when the kubelet is started with the command line
80+
flag `--cloud-provider=external`, its corresponding `Node` object will have a no schedule taint
81+
named `node.cloudprovider.kubernetes.io/uninitialized` added. Because the cloud-controller-manager
82+
is responsible for removing the no schedule taint, this can create a situation where a
83+
cloud-controller-manager that is being managed by a Kubernetes resource, such as a `Deployment`
84+
or `DaemonSet`, may not be able to schedule.
85+
86+
If the cloud-controller-manager is not able to be scheduled during the initialization of the
87+
control plane, then the resulting `Node` objects will all have the
88+
`node.cloudprovider.kubernetes.io/uninitialized` no schedule taint. It also means that this taint
89+
will not be removed as the cloud-controller-manager is responsible for its removal. If the no
90+
schedule taint is not removed, then critical workloads, such as the container network interface
91+
controllers, will not be able to schedule, and the cluster will be left in an unhealthy state.
92+
93+
### Example: Cloud controller manager not scheduling due to not-ready taint
94+
95+
The next example would be possible in situations where the container network interface (CNI) is
96+
waiting for IP address information from the cloud-controller-manager (CCM), and the CCM has not
97+
tolerated the taint which would be removed by the CNI.
98+
99+
The [Kubernetes documentation describes][kubedocs1] the `node.kubernetes.io/not-ready` taint as follows:
100+
101+
> "The Node controller detects whether a Node is ready by monitoring its health and adds or removes this taint accordingly."
102+
103+
One of the conditions that can lead to a `Node` resource having this taint is when the container
104+
network has not yet been initialized on that node. As the cloud-controller-manager is responsible
105+
for adding the IP addresses to a `Node` resource, and the IP addresses are needed by the container
106+
network controllers to properly configure the container network, it is possible in some
107+
circumstances for a node to become stuck as not ready and uninitialized permanently.
108+
109+
This situation occurs for a similar reason as the first example, although in this case, the
110+
`node.kubernetes.io/not-ready` taint is used with the no execute effect and thus will cause the
111+
cloud-controller-manager not to run on the node with the taint. If the cloud-controller-manager is
112+
not able to execute, then it will not initialize the node. It will cascade into the container
113+
network controllers not being able to run properly, and the node will end up carrying both the
114+
`node.cloudprovider.kubernetes.io/uninitialized` and `node.kubernetes.io/not-ready` taints,
115+
leaving the cluster in an unhealthy state.
116+
117+
## Our Recommendations
118+
119+
There is no one “correct way” to run a cloud-controller-manager. The details will depend on the
120+
specific needs of the cluster administrators and users. When planning your clusters and the
121+
lifecycle of the cloud-controller-managers please consider the following guidance:
122+
123+
For cloud-controller-managers running in the same cluster, they are managing.
124+
125+
1. Use host network mode, rather than the pod network: in most cases, a cloud controller manager
126+
will need to communicate with an API service endpoint associated with the infrastructure.
127+
Setting “hostNetwork” to true will ensure that the cloud controller is using the host
128+
networking instead of the container network and, as such, will have the same network access as
129+
the host operating system. It will also remove the dependency on the networking plugin. This
130+
will ensure that the cloud controller has access to the infrastructure endpoint (always check
131+
your networking configuration against your infrastructure provider’s instructions).
132+
2. Use a scalable resource type. `Deployments` and `DaemonSets` are useful for controlling the
133+
lifecycle of a cloud controller. They allow easy access to running multiple copies for redundancy
134+
as well as using the Kubernetes scheduling to ensure proper placement in the cluster. When using
135+
these primitives to control the lifecycle of your cloud controllers and running multiple
136+
replicas, you must remember to enable leader election, or else your controllers will collide
137+
with each other which could lead to nodes not being initialized in the cluster.
138+
3. Target the controller manager containers to the control plane. There might exist other
139+
controllers which need to run outside the control plane (for example, Azure’s node manager
140+
controller). Still, the controller managers themselves should be deployed to the control plane.
141+
Use a node selector or affinity stanza to direct the scheduling of cloud controllers to the
142+
control plane to ensure that they are running in a protected space. Cloud controllers are vital
143+
to adding and removing nodes to a cluster as they form a link between Kubernetes and the
144+
physical infrastructure. Running them on the control plane will help to ensure that they run
145+
with a similar priority as other core cluster controllers and that they have some separation
146+
from non-privileged user workloads.
147+
1. It is worth noting that an anti-affinity stanza to prevent cloud controllers from running
148+
on the same host is also very useful to ensure that a single node failure will not degrade
149+
the cloud controller performance.
150+
4. Ensure that the tolerations allow operation. Use tolerations on the manifest for the cloud
151+
controller container to ensure that it will schedule to the correct nodes and that it can run
152+
in situations where a node is initializing. This means that cloud controllers should tolerate
153+
the `node.cloudprovider.kubernetes.io/uninitialized` taint, and it should also tolerate any
154+
taints associated with the control plane (for example, `node-role.kubernetes.io/control-plane`
155+
or `node-role.kubernetes.io/master`). It can also be useful to tolerate the
156+
`node.kubernetes.io/not-ready` taint to ensure that the cloud controller can run even when the
157+
node is not yet available for health monitoring.
158+
159+
For cloud-controller-managers that will not be running on the cluster they manage (for example,
160+
in a hosted control plane on a separate cluster), then the rules are much more constrained by the
161+
dependencies of the environment of the cluster running the cloud-controller-manager. The advice
162+
for running on a self-managed cluster may not be appropriate as the types of conflicts and network
163+
constraints will be different. Please consult the architecture and requirements of your topology
164+
for these scenarios.
165+
166+
### Example
167+
168+
This is an example of a Kubernetes Deployment highlighting the guidance shown above. It is
169+
important to note that this is for demonstration purposes only, for production uses please
170+
consult your cloud provider’s documentation.
171+
172+
```
173+
apiVersion: apps/v1
174+
kind: Deployment
175+
metadata:
176+
labels:
177+
app.kubernetes.io/name: cloud-controller-manager
178+
kubernetes.io/description: "Cloud controller manager for my infrastructure"
179+
name: cloud-controller-manager
180+
namespace: kube-system
181+
spec:
182+
replicas: 2
183+
selector:
184+
matchLabels:
185+
app.kubernetes.io/name: cloud-controller-manager
186+
strategy:
187+
type: Recreate
188+
template:
189+
metadata:
190+
labels:
191+
app.kubernetes.io/name: cloud-controller-manager
192+
spec:
193+
containers: # the container details will depend on your specific cloud controller manager
194+
- name: cloud-controller-manager
195+
command:
196+
- /bin/my-infrastructure-cloud-controller-manager
197+
- --leader-elect=true
198+
- -v=1
199+
image: registry/my-infrastructure-cloud-controller-manager@latest
200+
resources:
201+
requests:
202+
cpu: 200m
203+
memory: 50Mi
204+
hostNetwork: true # these Pods are part of the control plane
205+
nodeSelector:
206+
node-role.kubernetes.io/control-plane: ""
207+
affinity:
208+
podAntiAffinity:
209+
requiredDuringSchedulingIgnoredDuringExecution:
210+
- topologyKey: "kubernetes.io/hostname"
211+
labelSelector:
212+
matchLabels:
213+
app.kubernetes.io/name: cloud-controller-manager
214+
tolerations:
215+
- effect: NoSchedule
216+
key: node-role.kubernetes.io/master
217+
operator: Exists
218+
- effect: NoExecute
219+
key: node.kubernetes.io/unreachable
220+
operator: Exists
221+
tolerationSeconds: 120
222+
- effect: NoExecute
223+
key: node.kubernetes.io/not-ready
224+
operator: Exists
225+
tolerationSeconds: 120
226+
- effect: NoSchedule
227+
key: node.cloudprovider.kubernetes.io/uninitialized
228+
operator: Exists
229+
- effect: NoSchedule
230+
key: node.kubernetes.io/not-ready
231+
operator: Exists
232+
```
233+
234+
When deciding how to deploy your cloud controller manager it is worth noting that
235+
cluster-proportional, or resource-based, pod autoscaling is not recommended. Running multiple
236+
replicas of a cloud controller manager is good practice for ensuring high-availability and
237+
redundancy, but does not contribute to better performance. In general, only a single instance
238+
of a cloud controller manager will be reconciling a cluster at any given time.
239+
240+
[migration-blog]: /blog/2024/05/20/completing-cloud-provider-migration/
241+
[kep2392]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md
242+
[kep1281]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy
243+
[kep2133]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers
244+
[csi]: https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-
245+
[kep625]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md
246+
[ccm]: /docs/concepts/architecture/cloud-controller/
247+
[chicken-and-egg]: /docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg
248+
[kubedocs0]: /docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager
249+
[kubedocs1]: /docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready

0 commit comments

Comments
 (0)