You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/book/src/topics/scale-from-0.md
+172Lines changed: 172 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,12 @@
2
2
3
3
With the changes introduce into `cluster-api` described in [this](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md#upgrade-strategy) proposal, a user can now opt in to scaling nodes from 0.
4
4
5
+
This entails a number of things which I will describe in detail.
6
+
7
+
The following actions need to be taken to enabled cluster autoscaling:
8
+
9
+
## Set Capacity field
10
+
5
11
To do that, simply define some values to the new field called `capacity` in the `AWSMachineTemplate` like this:
6
12
7
13
```yaml
@@ -25,3 +31,169 @@ status:
25
31
26
32
To read more about what values are available, consult the proposal. These values can be overridden by selected annotations
27
33
on the MachineTemplate.
34
+
35
+
## Add two necessary annotations to MachineDeployment
36
+
37
+
There are two annotations which need to be applied to the MachineDeployment like this:
These are necessary for the autoscaler to be able to pick up the deployment and scale it. Read more about these [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#enabling-autoscaling).
50
+
51
+
## Install and start cluster-autoscaler
52
+
53
+
Now comes the tricky part. In order for this to work, you need the cluster-autoscaler binary located [here](https://github.com/kubernetes/autoscaler).
54
+
You have to options. Use Helm to install autoscaler, or use the command line ( which is faster in if you are testing ).
55
+
56
+
In either cases, you need the following options:
57
+
- namespace
58
+
- cloud-provider
59
+
- scale-down-delay-after-add
60
+
- scale-down-delay-after-delete
61
+
- scale-down-delay-after-failure
62
+
- scale-down-unneeded-time
63
+
- expander
64
+
- kubeconfig
65
+
- cloud-config
66
+
67
+
These last two values are crucial for the autoscaler to work. `cloud-config` is the kubeconfig of the management cluster.
68
+
If you are using a service account to access it, you also have an option to define that. Read more about it on the
69
+
autoscaler's repository. The second one is the workload cluster. It needs both because the MachineDeployment is in the
70
+
control-plane while the actual node and pods are in the workload cluster.
71
+
72
+
Therefor, you have to install cluster-autoscaler into the _control-plane_ cluster.
73
+
74
+
I have a handy script to launch autoscaler which looks like this:
The Helm equivalent is a bit more complex and either needs to mount in the kubeconfig from somewhere or be pointed to it.
98
+
99
+
## Permissions
100
+
101
+
This depends on your scenario. Read about it more [here](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler).
102
+
Since this is Cluster API Provider AWS, you would need to look for the AWS provider settings [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md).
103
+
104
+
Further, the service account associated with cluster-autoscaler requires permissions to access `get` and `list` the
105
+
Cluster API machine template infrastructure objects.
There is a document describing under what circumstances it won't be able to scale located [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node). Read this carefully.
186
+
187
+
It has some ramifications when scaling back down to 0. Which will only work if all pods are removed from the node and
188
+
the node cannot schedule even the aws-node and kube-proxy pods. There is this tiny manual step of cordoning off the last
189
+
node in order to scale back down to 0.
190
+
191
+
## Conclusion
192
+
193
+
Once the cluster-autoscaler is running, you will start seeing nodes pop-in as soon as there is some load on the cluster.
194
+
To test it, simply create and inflate a deployment like this:
0 commit comments