You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update autoscaling from zero enhancement proposal with support for platform-aware autoscale from zero (#11962)
This commit updates the contract between the cluster-autoscaler Cluster API provider and the infrastructure provider's controllers that reconcile the Infrastructure Machine Template to support platform-aware autoscale from 0 in clusters consisting of nodes heterogeneous in CPU architecture and OS.
With this commit, the infrastructure providers implementing controllers to reconcile the status of their Infrastructure Machine Templates for supporting autoscale from 0 will be able to fill the status.nodeInfo stanza with additional information about the nodes.
The status.nodeInfo stanza has type corev1.NodeSystemInfo to reflect the same content, the rendered nodes' objects would store in their status field.
The cluster-autoscaler can use that information to build the node template labels `kubernetes.io/arch` and `kubernetes.io/os` if that information is present.
Suppose the pending pods that trigger the cluster autoscaler have a node selector or a requiredDuringSchedulingIgnoredDuringExecution node affinity concerning the architecture or operating system of the node where they can execute. In that case, the autoscaler will be able to filter the nodes groups options according to the architecture or operating system requested by the pod.
The users could already provide this information to the cluster autoscaler through the labels capacity annotation. However, there is no similar capability to support future labels/taints through information set by the reconcilers of the status of Infrastructure Machine Templates.
Signed-off-by: aleskandro <[email protected]>
Copy file name to clipboardExpand all lines: docs/book/src/developer/providers/contracts/infra-machine.md
+42-2Lines changed: 42 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -502,7 +502,7 @@ If implementing the pause behavior, providers SHOULD surface the paused status o
502
502
503
503
### InfraMachineTemplate: support cluster autoscaling from zero
504
504
505
-
As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement a capacity field in machine templates to inform the cluster autoscaler about the resources available on that machine type.
505
+
As described in the enhancement [Opt-in Autoscaling from Zero][Opt-in Autoscaling from Zero], providers may implement the `capacity` and `nodeInfo` fields in machine templates to inform the cluster autoscaler about the resources available on that machine type, the architecture, and the operating system it runs.
506
506
507
507
Building on the `FooMachineTemplate` example from above, this shows the addition of a status and capacity field:
508
508
@@ -525,19 +525,59 @@ type FooMachineTemplateStatus struct {
When rendered to a manifest, the machine template status capacity field representing an instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU would look like this:
561
+
When rendered to a manifest, the machine template status capacity field representing an amd64 linux instance with 500 megabytes of RAM, 1 CPU core, and 1 NVidia GPU should look like this:
532
562
533
563
```
534
564
status:
535
565
capacity:
536
566
memory: 500mb
537
567
cpu: "1"
538
568
nvidia.com/gpu: "1"
569
+
nodeInfo:
570
+
architecture: amd64
571
+
operatingSystem: linux
539
572
```
540
573
574
+
If the information in the `nodeInfo` field is not available, the result of the autoscaling from zero operation will depend
575
+
on the cluster autoscaler implementation. For example, the Cluster API implementation of the Kubernetes Cluster Autoscaler
576
+
will assume the host is running either the architecture set in the `CAPI_SCALE_ZERO_DEFAULT_ARCH` environment variable of
577
+
the cluster autoscaler pod environment, or the amd64 architecture and Linux operating system as default values.
578
+
579
+
See [autoscaling](../../../tasks/automated-machine-management/autoscaling.md).
580
+
541
581
## Typical InfraMachine reconciliation workflow
542
582
543
583
A machine infrastructure provider must respond to changes to its InfraMachine resources. This process is
The information stored in the `status.nodeInfo` field will be used by the cluster autoscaler's scheduler simulator to determine the simulated node's labels `kubernetes.io/arch` and `kubernetes.io/os`. This logic will be implemented in the cluster autoscaler's ClusterAPI cloud provider code.
256
+
209
257
#### MachineSet and MachineDeployment Annotations
210
258
211
259
In cases where a user needs to provide specific resource information for a
0 commit comments