Skip to content

Commit 9cd2a31

Browse files
committed
add blog post 2025-02-14 on cloud controller managers
and the chicken and egg problem.
1 parent 2214def commit 9cd2a31

File tree

1 file changed

+250
-0
lines changed

1 file changed

+250
-0
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
---
2+
layout: blog
3+
title: "The Cloud Controller Manager Chicken and Egg Problem"
4+
date: 2025-02-14
5+
slug: cloud-controller-manager-chicken-egg-problem
6+
author: >
7+
Antonio Ojea,
8+
Michael McCune
9+
draft: true
10+
---
11+
12+
Kubernetes 1.31
13+
[completed the largest migration in Kubernetes history][migration-blog], removing the in-tree
14+
cloud provider. While the component migration is now done, this leaves some additional
15+
complexity for users and installer projects (for example, kOps or Cluster API) . We will go
16+
over those additional steps and failure points and make recommendations for cluster owners.
17+
This migration was complex and some logic had to be extracted from the core components,
18+
building four new subsystems.
19+
20+
1. **Cloud controller manager** ([KEP-2392][kep2392])
21+
2. **API server network proxy** ([KEP-1281][kep1281])
22+
3. **kubelet credential provider plugins** ([KEP-2133][kep2133])
23+
4. **Storage migration to use [CSI][csi]** ([KEP-625][kep625])
24+
25+
The [cloud controller manager is part of the control plane][ccm]. It is a critical component
26+
that replaces some functionality that existed previously in the kube-controller-manager and the
27+
kubelet.
28+
29+
![Components of Kubernetes](https://kubernetes.io/images/docs/components-of-kubernetes.svg)
30+
31+
One of the most critical functionalities of the cloud controller manager is the node controller,
32+
which is responsible for the initialization of the nodes.
33+
34+
As you can see in the following diagram, when the **kubelet** starts, it registers the `Node`
35+
object with the apiserver, Tainting the node so it can be processed first by the
36+
cloud-controller-manager. The initial `Node` is missing the cloud-provider specific information,
37+
like the Node Addresses and the Labels with the cloud provider specific information like the
38+
Node, Region and Instance type information.
39+
40+
```mermaid
41+
sequenceDiagram
42+
autonumber
43+
rect rgb(191, 223, 255)
44+
Kubelet->>+Kube-apiserver: Create Node
45+
Note over Kubelet: Taint:<br/> node.cloudprovider.kubernetes.io
46+
Kube-apiserver->>-Kubelet: Node Created
47+
end
48+
Note over Kube-apiserver: Node is Not Ready<br/> Tainted, Missing Node Addresses*, ...
49+
Note over Kube-apiserver: Send Updates
50+
rect rgb(200, 150, 255)
51+
Kube-apiserver->>+Cloud-controller-manager: Watch: New Node Created
52+
Note over Cloud-controller-manager: Initialize Node:<br/>Cloud Provider Labels, Node Addresses, ...
53+
Cloud-controller-manager->>-Kube-apiserver: Update Node
54+
end
55+
Note over Kube-apiserver: Node is Ready
56+
```
57+
58+
This new initialization process adds some latency to the node readiness. Previously, the kubelet
59+
was able to initialize the node at the same time it created the node. Since the logic has moved
60+
to the cloud-controller-manager, this can cause a [chicken and egg problem][chicken-and-egg]
61+
during the cluster bootstrapping for those Kubernetes architectures that do not deploy the
62+
controller manager as the other components of the control plane, commonly as static pods,
63+
standalone binaries or daemonsets/deployments with tolerations to the taints and using
64+
`hostNetwork` (more on this below)
65+
66+
## Examples of the dependency problem
67+
68+
As noted above, it is possible during bootstrapping for the cloud-controller-manager to be
69+
unschedulable and as such the cluster will not initialize properly. The following are a few
70+
concrete examples of how this problem can be expressed and the root causes for why they might
71+
occur.
72+
73+
These examples assume you are running your cloud-controller-manager using a Kubernetes resource
74+
(e.g. Deployment, DaemonSet, or similar) to control its lifecycle. Because these methods
75+
rely on Kubernetes to schedule the cloud-controller-manager, care must be taken to ensure it
76+
will schedule properly.
77+
78+
### Example: Cloud controller manager not scheduling due to uninitialized taint
79+
80+
As [noted in the Kubernetes documentation][kubedocs0], when the kubelet is started with the command line
81+
flag `--cloud-provider=external`, its corresponding `Node` object will have a no schedule taint
82+
named `node.cloudprovider.kubernetes.io/uninitialized` added. Because the cloud-controller-manager
83+
is responsible for removing the no schedule taint, this can create a situation where a
84+
cloud-controller-manager that is being managed by a Kubernetes resource, such as a `Deployment`
85+
or `DaemonSet`, may not be able to schedule.
86+
87+
If the cloud-controller-manager is not able to be scheduled during the initialization of the
88+
control plane, then the resulting `Node` objects will all have the
89+
`node.cloudprovider.kubernetes.io/uninitialized` no schedule taint. It also means that this taint
90+
will not be removed as the cloud-controller-manager is responsible for its removal. If the no
91+
schedule taint is not removed, then critical workloads, such as the container network interface
92+
controllers, will not be able to schedule, and the cluster will be left in an unhealthy state.
93+
94+
### Example: Cloud controller manager not scheduling due to not-ready taint
95+
96+
The next example would be possible in situations where the container network interface (CNI) is
97+
waiting for IP address information from the cloud-controller-manager (CCM), and the CCM has not
98+
tolerated the taint which would be removed by the CNI.
99+
100+
The [Kubernetes documentation describes][kubedocs1] the `node.kubernetes.io/not-ready` taint as follows:
101+
102+
> "The Node controller detects whether a Node is ready by monitoring its health and adds or removes this taint accordingly."
103+
104+
One of the conditions that can lead to a `Node` resource having this taint is when the container
105+
network has not yet been initialized on that node. As the cloud-controller-manager is responsible
106+
for adding the IP addresses to a `Node` resource, and the IP addresses are needed by the container
107+
network controllers to properly configure the container network, it is possible in some
108+
circumstances for a node to become stuck as not ready and uninitialized permanently.
109+
110+
This situation occurs for a similar reason as the first example, although in this case, the
111+
`node.kubernetes.io/not-ready` taint is used with the no execute effect and thus will cause the
112+
cloud-controller-manager not to run on the node with the taint. If the cloud-controller-manager is
113+
not able to execute, then it will not initialize the node. It will cascade into the container
114+
network controllers not being able to run properly, and the node will end up carrying both the
115+
`node.cloudprovider.kubernetes.io/uninitialized` and `node.kubernetes.io/not-ready` taints,
116+
leaving the cluster in an unhealthy state.
117+
118+
## Our Recommendations
119+
120+
There is no one “correct way” to run a cloud-controller-manager. The details will depend on the
121+
specific needs of the cluster administrators and users. When planning your clusters and the
122+
lifecycle of the cloud-controller-managers please consider the following guidance:
123+
124+
For cloud-controller-managers running in the same cluster, they are managing.
125+
126+
1. Use host network mode, rather than the pod network: in most cases, a cloud controller manager
127+
will need to communicate with an API service endpoint associated with the infrastructure.
128+
Setting “hostNetwork” to true will ensure that the cloud controller is using the host
129+
networking instead of the container network and, as such, will have the same network access as
130+
the host operating system. It will also remove the dependency on the networking plugin. This
131+
will ensure that the cloud controller has access to the infrastructure endpoint (always check
132+
your networking configuration against your infrastructure provider’s instructions).
133+
2. Use a scalable resource type. `Deployments` and `DaemonSets` are useful for controlling the
134+
lifecycle of a cloud controller. They allow easy access to running multiple copies for redundancy
135+
as well as using the Kubernetes scheduling to ensure proper placement in the cluster. When using
136+
these primitives to control the lifecycle of your cloud controllers and running multiple
137+
replicas, you must remember to enable leader election, or else your controllers will collide
138+
with each other which could lead to nodes not being initialized in the cluster.
139+
3. Target the controller manager containers to the control plane. There might exist other
140+
controllers which need to run outside the control plane (for example, Azure’s node manager
141+
controller). Still, the controller managers themselves should be deployed to the control plane.
142+
Use a node selector or affinity stanza to direct the scheduling of cloud controllers to the
143+
control plane to ensure that they are running in a protected space. Cloud controllers are vital
144+
to adding and removing nodes to a cluster as they form a link between Kubernetes and the
145+
physical infrastructure. Running them on the control plane will help to ensure that they run
146+
with a similar priority as other core cluster controllers and that they have some separation
147+
from non-privileged user workloads.
148+
1. It is worth noting that an anti-affinity stanza to prevent cloud controllers from running
149+
on the same host is also very useful to ensure that a single node failure will not degrade
150+
the cloud controller performance.
151+
4. Ensure that the tolerations allow operation. Use tolerations on the manifest for the cloud
152+
controller container to ensure that it will schedule to the correct nodes and that it can run
153+
in situations where a node is initializing. This means that cloud controllers should tolerate
154+
the `node.cloudprovider.kubernetes.io/uninitialized` taint, and it should also tolerate any
155+
taints associated with the control plane (for example, `node-role.kubernetes.io/control-plane`
156+
or `node-role.kubernetes.io/master`). It can also be useful to tolerate the
157+
`node.kubernetes.io/not-ready` taint to ensure that the cloud controller can run even when the
158+
node is not yet available for health monitoring.
159+
160+
For cloud-controller-managers that will not be running on the cluster they manage (for example,
161+
in a hosted control plane on a separate cluster), then the rules are much more constrained by the
162+
dependencies of the environment of the cluster running the cloud-controller-manager. The advice
163+
for running on a self-managed cluster may not be appropriate as the types of conflicts and network
164+
constraints will be different. Please consult the architecture and requirements of your topology
165+
for these scenarios.
166+
167+
### Example
168+
169+
This is an example of a Kubernetes Deployment highlighting the guidance shown above. It is
170+
important to note that this is for demonstration purposes only, for production uses please
171+
consult your cloud provider’s documentation.
172+
173+
```
174+
apiVersion: apps/v1
175+
kind: Deployment
176+
metadata:
177+
labels:
178+
app.kubernetes.io/name: cloud-controller-manager
179+
kubernetes.io/description: "Cloud controller manager for my infrastructure"
180+
name: cloud-controller-manager
181+
namespace: kube-system
182+
spec:
183+
replicas: 2
184+
selector:
185+
matchLabels:
186+
app.kubernetes.io/name: cloud-controller-manager
187+
strategy:
188+
type: Recreate
189+
template:
190+
metadata:
191+
labels:
192+
app.kubernetes.io/name: cloud-controller-manager
193+
spec:
194+
containers: # the container details will depend on your specific cloud controller manager
195+
- name: cloud-controller-manager
196+
command:
197+
- /bin/my-infrastructure-cloud-controller-manager
198+
- --leader-elect=true
199+
- -v=1
200+
image: registry/my-infrastructure-cloud-controller-manager@latest
201+
resources:
202+
requests:
203+
cpu: 200m
204+
memory: 50Mi
205+
hostNetwork: true # these Pods are part of the control plane
206+
nodeSelector:
207+
node-role.kubernetes.io/control-plane: ""
208+
affinity:
209+
podAntiAffinity:
210+
requiredDuringSchedulingIgnoredDuringExecution:
211+
- topologyKey: "kubernetes.io/hostname"
212+
labelSelector:
213+
matchLabels:
214+
app.kubernetes.io/name: cloud-controller-manager
215+
tolerations:
216+
- effect: NoSchedule
217+
key: node-role.kubernetes.io/master
218+
operator: Exists
219+
- effect: NoExecute
220+
key: node.kubernetes.io/unreachable
221+
operator: Exists
222+
tolerationSeconds: 120
223+
- effect: NoExecute
224+
key: node.kubernetes.io/not-ready
225+
operator: Exists
226+
tolerationSeconds: 120
227+
- effect: NoSchedule
228+
key: node.cloudprovider.kubernetes.io/uninitialized
229+
operator: Exists
230+
- effect: NoSchedule
231+
key: node.kubernetes.io/not-ready
232+
operator: Exists
233+
```
234+
235+
When deciding how to deploy your cloud controller manager it is worth noting that
236+
cluster-proportional, or resource-based, pod autoscaling is not recommended. Running multiple
237+
replicas of a cloud controller manager is good practice for ensuring high-availability and
238+
redundancy, but does not contribute to better performance. In general, only a single instance
239+
of a cloud controller manager will be reconciling a cluster at any given time.
240+
241+
[migration-blog]: /blog/2024/05/20/completing-cloud-provider-migration/
242+
[kep2392]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md
243+
[kep1281]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy
244+
[kep2133]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers
245+
[csi]: https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-
246+
[kep625]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md
247+
[ccm]: /docs/concepts/architecture/cloud-controller/
248+
[chicken-and-egg]: /docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg
249+
[kubedocs0]: /docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager
250+
[kubedocs1]: /docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready

0 commit comments

Comments
 (0)