Skip to content

Commit 7252545

Browse files
committed
Merge branch 'main' into id/translate-protocols-for-services-page
2 parents ee422d2 + 37b54fe commit 7252545

File tree

88 files changed

+3080
-420
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+3080
-420
lines changed

OWNERS_ALIASES

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,8 @@ aliases:
5252
- sajibAdhi
5353
sig-docs-de-owners: # Admins for German content
5454
- bene2k1
55-
- rlenferink
5655
sig-docs-de-reviews: # PR reviews for German content
5756
- bene2k1
58-
- rlenferink
5957
sig-docs-en-owners: # Admins for English content
6058
- dipesh-rawat
6159
- divya-mohan0209
@@ -73,7 +71,6 @@ aliases:
7371
- katcosgrove
7472
- kbhawkey
7573
- mengjiao-liu
76-
- mickeyboxell
7774
- natalisucks
7875
- nate-double-u
7976
- reylejano
@@ -105,19 +102,16 @@ aliases:
105102
- dipesh-rawat
106103
- divya-mohan0209
107104
sig-docs-hi-reviews: # PR reviews for Hindi content
108-
- Babapool
109105
- bishal7679
110106
- dipesh-rawat
111107
- divya-mohan0209
112108
- niranjandarshann
113109
sig-docs-id-owners: # Admins for Indonesian content
114110
- ariscahyadi
115-
- danninov
116111
- girikuncoro
117112
- habibrosyad
118113
sig-docs-id-reviews: # PR reviews for Indonesian content
119114
- ariscahyadi
120-
- danninov
121115
- girikuncoro
122116
- habibrosyad
123117
sig-docs-it-owners: # Admins for Italian content
@@ -169,8 +163,6 @@ aliases:
169163
- mengjiao-liu
170164
- my-git9
171165
- SataQiu
172-
- Sea-n
173-
- tanjunchen
174166
- tengqm
175167
- windsonsea
176168
- xichengliudui
@@ -180,33 +172,26 @@ aliases:
180172
- chenrui333
181173
- howieyuen
182174
# idealhack
183-
- kinzhi
184175
- mengjiao-liu
185176
- my-git9
186177
# pigletfly
187178
- SataQiu
188-
- Sea-n
189-
- tanjunchen
190179
- tengqm
191180
- windsonsea
192181
- xichengliudui
193182
- ydFu
194183
sig-docs-pt-owners: # Admins for Portuguese content
195-
- devlware
196184
- edsoncelio
197185
- jcjesus
198186
- stormqueen1990
199187
sig-docs-pt-reviews: # PR reviews for Portugese content
200-
- devlware
201188
- edsoncelio
202189
- jcjesus
203190
- mrerlison
204191
- stormqueen1990
205192
sig-docs-vi-owners: # Admins for Vietnamese content
206-
- huynguyennovem
207193
- truongnh1992
208194
sig-docs-vi-reviews: # PR reviews for Vietnamese content
209-
- huynguyennovem
210195
- truongnh1992
211196
sig-docs-ru-owners: # Admins for Russian content
212197
- Arhell
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
---
2+
layout: blog
3+
title: "The Cloud Controller Manager Chicken and Egg Problem"
4+
date: 2025-02-14
5+
slug: cloud-controller-manager-chicken-egg-problem
6+
author: >
7+
Antonio Ojea,
8+
Michael McCune
9+
---
10+
11+
Kubernetes 1.31
12+
[completed the largest migration in Kubernetes history][migration-blog], removing the in-tree
13+
cloud provider. While the component migration is now done, this leaves some additional
14+
complexity for users and installer projects (for example, kOps or Cluster API) . We will go
15+
over those additional steps and failure points and make recommendations for cluster owners.
16+
This migration was complex and some logic had to be extracted from the core components,
17+
building four new subsystems.
18+
19+
1. **Cloud controller manager** ([KEP-2392][kep2392])
20+
2. **API server network proxy** ([KEP-1281][kep1281])
21+
3. **kubelet credential provider plugins** ([KEP-2133][kep2133])
22+
4. **Storage migration to use [CSI][csi]** ([KEP-625][kep625])
23+
24+
The [cloud controller manager is part of the control plane][ccm]. It is a critical component
25+
that replaces some functionality that existed previously in the kube-controller-manager and the
26+
kubelet.
27+
28+
![Components of Kubernetes](https://kubernetes.io/images/docs/components-of-kubernetes.svg)
29+
30+
One of the most critical functionalities of the cloud controller manager is the node controller,
31+
which is responsible for the initialization of the nodes.
32+
33+
As you can see in the following diagram, when the **kubelet** starts, it registers the `Node`
34+
object with the apiserver, Tainting the node so it can be processed first by the
35+
cloud-controller-manager. The initial `Node` is missing the cloud-provider specific information,
36+
like the Node Addresses and the Labels with the cloud provider specific information like the
37+
Node, Region and Instance type information.
38+
39+
```mermaid
40+
sequenceDiagram
41+
autonumber
42+
rect rgb(191, 223, 255)
43+
Kubelet->>+Kube-apiserver: Create Node
44+
Note over Kubelet: Taint:<br/> node.cloudprovider.kubernetes.io
45+
Kube-apiserver->>-Kubelet: Node Created
46+
end
47+
Note over Kube-apiserver: Node is Not Ready<br/> Tainted, Missing Node Addresses*, ...
48+
Note over Kube-apiserver: Send Updates
49+
rect rgb(200, 150, 255)
50+
Kube-apiserver->>+Cloud-controller-manager: Watch: New Node Created
51+
Note over Cloud-controller-manager: Initialize Node:<br/>Cloud Provider Labels, Node Addresses, ...
52+
Cloud-controller-manager->>-Kube-apiserver: Update Node
53+
end
54+
Note over Kube-apiserver: Node is Ready
55+
```
56+
57+
This new initialization process adds some latency to the node readiness. Previously, the kubelet
58+
was able to initialize the node at the same time it created the node. Since the logic has moved
59+
to the cloud-controller-manager, this can cause a [chicken and egg problem][chicken-and-egg]
60+
during the cluster bootstrapping for those Kubernetes architectures that do not deploy the
61+
controller manager as the other components of the control plane, commonly as static pods,
62+
standalone binaries or daemonsets/deployments with tolerations to the taints and using
63+
`hostNetwork` (more on this below)
64+
65+
## Examples of the dependency problem
66+
67+
As noted above, it is possible during bootstrapping for the cloud-controller-manager to be
68+
unschedulable and as such the cluster will not initialize properly. The following are a few
69+
concrete examples of how this problem can be expressed and the root causes for why they might
70+
occur.
71+
72+
These examples assume you are running your cloud-controller-manager using a Kubernetes resource
73+
(e.g. Deployment, DaemonSet, or similar) to control its lifecycle. Because these methods
74+
rely on Kubernetes to schedule the cloud-controller-manager, care must be taken to ensure it
75+
will schedule properly.
76+
77+
### Example: Cloud controller manager not scheduling due to uninitialized taint
78+
79+
As [noted in the Kubernetes documentation][kubedocs0], when the kubelet is started with the command line
80+
flag `--cloud-provider=external`, its corresponding `Node` object will have a no schedule taint
81+
named `node.cloudprovider.kubernetes.io/uninitialized` added. Because the cloud-controller-manager
82+
is responsible for removing the no schedule taint, this can create a situation where a
83+
cloud-controller-manager that is being managed by a Kubernetes resource, such as a `Deployment`
84+
or `DaemonSet`, may not be able to schedule.
85+
86+
If the cloud-controller-manager is not able to be scheduled during the initialization of the
87+
control plane, then the resulting `Node` objects will all have the
88+
`node.cloudprovider.kubernetes.io/uninitialized` no schedule taint. It also means that this taint
89+
will not be removed as the cloud-controller-manager is responsible for its removal. If the no
90+
schedule taint is not removed, then critical workloads, such as the container network interface
91+
controllers, will not be able to schedule, and the cluster will be left in an unhealthy state.
92+
93+
### Example: Cloud controller manager not scheduling due to not-ready taint
94+
95+
The next example would be possible in situations where the container network interface (CNI) is
96+
waiting for IP address information from the cloud-controller-manager (CCM), and the CCM has not
97+
tolerated the taint which would be removed by the CNI.
98+
99+
The [Kubernetes documentation describes][kubedocs1] the `node.kubernetes.io/not-ready` taint as follows:
100+
101+
> "The Node controller detects whether a Node is ready by monitoring its health and adds or removes this taint accordingly."
102+
103+
One of the conditions that can lead to a `Node` resource having this taint is when the container
104+
network has not yet been initialized on that node. As the cloud-controller-manager is responsible
105+
for adding the IP addresses to a `Node` resource, and the IP addresses are needed by the container
106+
network controllers to properly configure the container network, it is possible in some
107+
circumstances for a node to become stuck as not ready and uninitialized permanently.
108+
109+
This situation occurs for a similar reason as the first example, although in this case, the
110+
`node.kubernetes.io/not-ready` taint is used with the no execute effect and thus will cause the
111+
cloud-controller-manager not to run on the node with the taint. If the cloud-controller-manager is
112+
not able to execute, then it will not initialize the node. It will cascade into the container
113+
network controllers not being able to run properly, and the node will end up carrying both the
114+
`node.cloudprovider.kubernetes.io/uninitialized` and `node.kubernetes.io/not-ready` taints,
115+
leaving the cluster in an unhealthy state.
116+
117+
## Our Recommendations
118+
119+
There is no one “correct way” to run a cloud-controller-manager. The details will depend on the
120+
specific needs of the cluster administrators and users. When planning your clusters and the
121+
lifecycle of the cloud-controller-managers please consider the following guidance:
122+
123+
For cloud-controller-managers running in the same cluster, they are managing.
124+
125+
1. Use host network mode, rather than the pod network: in most cases, a cloud controller manager
126+
will need to communicate with an API service endpoint associated with the infrastructure.
127+
Setting “hostNetwork” to true will ensure that the cloud controller is using the host
128+
networking instead of the container network and, as such, will have the same network access as
129+
the host operating system. It will also remove the dependency on the networking plugin. This
130+
will ensure that the cloud controller has access to the infrastructure endpoint (always check
131+
your networking configuration against your infrastructure provider’s instructions).
132+
2. Use a scalable resource type. `Deployments` and `DaemonSets` are useful for controlling the
133+
lifecycle of a cloud controller. They allow easy access to running multiple copies for redundancy
134+
as well as using the Kubernetes scheduling to ensure proper placement in the cluster. When using
135+
these primitives to control the lifecycle of your cloud controllers and running multiple
136+
replicas, you must remember to enable leader election, or else your controllers will collide
137+
with each other which could lead to nodes not being initialized in the cluster.
138+
3. Target the controller manager containers to the control plane. There might exist other
139+
controllers which need to run outside the control plane (for example, Azure’s node manager
140+
controller). Still, the controller managers themselves should be deployed to the control plane.
141+
Use a node selector or affinity stanza to direct the scheduling of cloud controllers to the
142+
control plane to ensure that they are running in a protected space. Cloud controllers are vital
143+
to adding and removing nodes to a cluster as they form a link between Kubernetes and the
144+
physical infrastructure. Running them on the control plane will help to ensure that they run
145+
with a similar priority as other core cluster controllers and that they have some separation
146+
from non-privileged user workloads.
147+
1. It is worth noting that an anti-affinity stanza to prevent cloud controllers from running
148+
on the same host is also very useful to ensure that a single node failure will not degrade
149+
the cloud controller performance.
150+
4. Ensure that the tolerations allow operation. Use tolerations on the manifest for the cloud
151+
controller container to ensure that it will schedule to the correct nodes and that it can run
152+
in situations where a node is initializing. This means that cloud controllers should tolerate
153+
the `node.cloudprovider.kubernetes.io/uninitialized` taint, and it should also tolerate any
154+
taints associated with the control plane (for example, `node-role.kubernetes.io/control-plane`
155+
or `node-role.kubernetes.io/master`). It can also be useful to tolerate the
156+
`node.kubernetes.io/not-ready` taint to ensure that the cloud controller can run even when the
157+
node is not yet available for health monitoring.
158+
159+
For cloud-controller-managers that will not be running on the cluster they manage (for example,
160+
in a hosted control plane on a separate cluster), then the rules are much more constrained by the
161+
dependencies of the environment of the cluster running the cloud-controller-manager. The advice
162+
for running on a self-managed cluster may not be appropriate as the types of conflicts and network
163+
constraints will be different. Please consult the architecture and requirements of your topology
164+
for these scenarios.
165+
166+
### Example
167+
168+
This is an example of a Kubernetes Deployment highlighting the guidance shown above. It is
169+
important to note that this is for demonstration purposes only, for production uses please
170+
consult your cloud provider’s documentation.
171+
172+
```
173+
apiVersion: apps/v1
174+
kind: Deployment
175+
metadata:
176+
labels:
177+
app.kubernetes.io/name: cloud-controller-manager
178+
name: cloud-controller-manager
179+
namespace: kube-system
180+
spec:
181+
replicas: 2
182+
selector:
183+
matchLabels:
184+
app.kubernetes.io/name: cloud-controller-manager
185+
strategy:
186+
type: Recreate
187+
template:
188+
metadata:
189+
labels:
190+
app.kubernetes.io/name: cloud-controller-manager
191+
annotations:
192+
kubernetes.io/description: Cloud controller manager for my infrastructure
193+
spec:
194+
containers: # the container details will depend on your specific cloud controller manager
195+
- name: cloud-controller-manager
196+
command:
197+
- /bin/my-infrastructure-cloud-controller-manager
198+
- --leader-elect=true
199+
- -v=1
200+
image: registry/my-infrastructure-cloud-controller-manager@latest
201+
resources:
202+
requests:
203+
cpu: 200m
204+
memory: 50Mi
205+
hostNetwork: true # these Pods are part of the control plane
206+
nodeSelector:
207+
node-role.kubernetes.io/control-plane: ""
208+
affinity:
209+
podAntiAffinity:
210+
requiredDuringSchedulingIgnoredDuringExecution:
211+
- topologyKey: "kubernetes.io/hostname"
212+
labelSelector:
213+
matchLabels:
214+
app.kubernetes.io/name: cloud-controller-manager
215+
tolerations:
216+
- effect: NoSchedule
217+
key: node-role.kubernetes.io/master
218+
operator: Exists
219+
- effect: NoExecute
220+
key: node.kubernetes.io/unreachable
221+
operator: Exists
222+
tolerationSeconds: 120
223+
- effect: NoExecute
224+
key: node.kubernetes.io/not-ready
225+
operator: Exists
226+
tolerationSeconds: 120
227+
- effect: NoSchedule
228+
key: node.cloudprovider.kubernetes.io/uninitialized
229+
operator: Exists
230+
- effect: NoSchedule
231+
key: node.kubernetes.io/not-ready
232+
operator: Exists
233+
```
234+
235+
When deciding how to deploy your cloud controller manager it is worth noting that
236+
cluster-proportional, or resource-based, pod autoscaling is not recommended. Running multiple
237+
replicas of a cloud controller manager is good practice for ensuring high-availability and
238+
redundancy, but does not contribute to better performance. In general, only a single instance
239+
of a cloud controller manager will be reconciling a cluster at any given time.
240+
241+
[migration-blog]: /blog/2024/05/20/completing-cloud-provider-migration/
242+
[kep2392]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md
243+
[kep1281]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy
244+
[kep2133]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers
245+
[csi]: https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-
246+
[kep625]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md
247+
[ccm]: /docs/concepts/architecture/cloud-controller/
248+
[chicken-and-egg]: /docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg
249+
[kubedocs0]: /docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager
250+
[kubedocs1]: /docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready

0 commit comments

Comments
 (0)