|
| 1 | +# Cluster Autoscaler on Huawei Cloud |
| 2 | + |
| 3 | +## Overview |
| 4 | +The cluster autoscaler for [Huawei Cloud](https://www.huaweicloud.com/) scales worker nodes within any |
| 5 | +specified Huawei Cloud Container Engine (CCE) cluster's node pool where the `Autoscaler` label is on. |
| 6 | +It runs as a Deployment on a worker node in the cluster. This README will go over some of the necessary steps required |
| 7 | +to get the cluster autoscaler up and running. |
| 8 | + |
| 9 | +Note: |
| 10 | + |
| 11 | +1. Cluster autoscaler must be run on CCE v1.15.6 (Kubernetes v1.15) or later. |
| 12 | +2. Node pool attached to the CCE cluster must have the `Autoscaler` flag turned on, and minimum number of nodes and maximum |
| 13 | +number of nodes being set. Node pools can be managed under `Resource Management` in CCE console. |
| 14 | +3. If warnings about installing `autoscaler addon` are encountered after creating a node pool with `Autoscaler` flag on, |
| 15 | +just ignore this warning and DO NOT install the addon. |
| 16 | +4. Do not build your image in a Huawei Cloud ECS. Build the image in a machine that has access to the Google Container Registry (GCR). |
| 17 | + |
| 18 | +## Deployment Steps |
| 19 | +### Build Image |
| 20 | +#### Environment |
| 21 | +1. Download Project |
| 22 | + |
| 23 | + Get the latest `autoscaler` project and download it to `${GOPATH}/src/k8s.io`. |
| 24 | + |
| 25 | + This is used for building your image, so the machine you use here should be able to access GCR. Do not use a Huawei |
| 26 | + Cloud ECS. |
| 27 | + |
| 28 | +2. Go environment |
| 29 | + |
| 30 | + Make sure you have Go installed in the above machine. |
| 31 | + |
| 32 | +3. Docker environment |
| 33 | + |
| 34 | + Make sure you have Docker installed in the above machine. |
| 35 | + |
| 36 | +#### Build and push the image |
| 37 | +Execute the following commands in the directory of `autoscaler/cluster-autoscaler` of the autoscaler project downloaded previously. |
| 38 | +The following steps use Huawei SoftWare Repository for Container (SWR) as an example registry. |
| 39 | + |
| 40 | +1. Build the `cluster-autoscaler` binary: |
| 41 | + ``` |
| 42 | + make build-in-docker |
| 43 | + ``` |
| 44 | +2. Build the docker image: |
| 45 | + ``` |
| 46 | + docker build -t {Image repository address}/{Organization name}/{Image name:tag} . |
| 47 | + ``` |
| 48 | + For example: |
| 49 | + ``` |
| 50 | + docker build -t swr.cn-north-4.myhuaweicloud.com/{Organization name}/cluster-autoscaler:dev . |
| 51 | + ``` |
| 52 | + Follow the `Pull/Push Image` section of `Interactive Walkthroughs` under the SWR console to find the image repository address and organization name, |
| 53 | + and also refer to `My Images` -> `Upload Through Docker Client` in SWR console. |
| 54 | + |
| 55 | +3. Login to SWR: |
| 56 | + ``` |
| 57 | + docker login -u {Encoded username} -p {Encoded password} {SWR endpoint} |
| 58 | + ``` |
| 59 | + |
| 60 | + For example: |
| 61 | + ``` |
| 62 | + docker login -u cn-north-4@ABCD1EFGH2IJ34KLMN -p 1a23bc45678def9g01hi23jk4l56m789nop01q2r3s4t567u89v0w1x23y4z5678 swr.cn-north-4.myhuaweicloud.com |
| 63 | + ``` |
| 64 | + Follow the `Pull/Push Image` section of `Interactive Walkthroughs` under the SWR console to find the encoded username, encoded password and swr endpoint, |
| 65 | + and also refer to `My Images` -> `Upload Through Docker Client` in SWR console. |
| 66 | + |
| 67 | +4. Push the docker image to SWR: |
| 68 | + ``` |
| 69 | + docker push {Image repository address}/{Organization name}/{Image name:tag} |
| 70 | + ``` |
| 71 | + |
| 72 | + For example: |
| 73 | + ``` |
| 74 | + docker push swr.cn-north-4.myhuaweicloud.com/{Organization name}/cluster-autoscaler:dev |
| 75 | + ``` |
| 76 | + |
| 77 | +5. For the cluster autoscaler to function normally, make sure the `Sharing Type` of the image is `Public`. |
| 78 | + If the CCE has trouble pulling the image, go to SWR console and check whether the `Sharing Type` of the image is |
| 79 | + `Private`. If it is, click `Edit` button on top right and set the `Sharing Type` to `Public`. |
| 80 | +
|
| 81 | +### Deploy Cluster Autoscaler |
| 82 | +#### Configure credentials |
| 83 | +The autoscaler needs a `ServiceAccount` which is granted permissions to the cluster's resources and a `Secret` which |
| 84 | +stores credential (AK/SK in this case) information for authenticating with Huawei cloud. |
| 85 | + |
| 86 | +Examples of `ServiceAccount` and `Secret` are provided in [examples/cluster-autoscaler-svcaccount.yaml](examples/cluster-autoscaler-svcaccount.yaml) |
| 87 | +and [examples/cluster-autoscaler-secret.yaml](examples/cluster-autoscaler-secret.yaml). Modify the Secret |
| 88 | +object yaml file with your credentials. |
| 89 | +
|
| 90 | +The following parameters are required in the Secret object yaml file: |
| 91 | +
|
| 92 | +- `identity-endpoint` |
| 93 | +
|
| 94 | + Find the identity endpoint for different regions [here](https://support.huaweicloud.com/en-us/api-iam/iam_01_0001.html), |
| 95 | + and fill in this field with `https://{Identity Endpoint}/v3.0`. |
| 96 | + |
| 97 | + For example, for region `cn-north-4`, fill in the `identity-endpoint` as |
| 98 | + ``` |
| 99 | + identity-endpoint=https://iam.cn-north-4.myhuaweicloud.com/v3.0 |
| 100 | + ``` |
| 101 | +
|
| 102 | +- `project-id` |
| 103 | + |
| 104 | + Follow this link to find the project-id: [Obtaining a Project ID](https://support.huaweicloud.com/en-us/api-servicestage/servicestage_api_0023.html) |
| 105 | +
|
| 106 | +- `access-key` and `secret-key` |
| 107 | +
|
| 108 | + Create and find the Huawei cloud access-key and secret-key |
| 109 | +required by the Secret object yaml file by referring to [Access Keys](https://support.huaweicloud.com/en-us/usermanual-ca/ca_01_0003.html) |
| 110 | +and [My Credentials](https://support.huaweicloud.com/en-us/usermanual-ca/ca_01_0001.html). |
| 111 | +
|
| 112 | +- `region` |
| 113 | +
|
| 114 | + Fill in the region of the CCE here. For example, for region `Beijing4`: |
| 115 | + ``` |
| 116 | + region=cn-north-4 |
| 117 | + ``` |
| 118 | +
|
| 119 | +- `domain-id` |
| 120 | +
|
| 121 | + The required domain-id is the Huawei cloud [Account ID](https://support.huaweicloud.com/en-us/api-servicestage/servicestage_api_0048.html). |
| 122 | +
|
| 123 | +#### Configure deployment |
| 124 | + An example deployment file is provided at [examples/cluster-autoscaler-deployment.yaml](examples/cluster-autoscaler-deployment.yaml). |
| 125 | + Change the `image` to the image you just pushed, the `cluster-name` to the CCE cluster's id and `nodes` to your |
| 126 | + own configurations of the node pool with format |
| 127 | + ``` |
| 128 | + {Minimum number of nodes}:{Maximum number of nodes}:{Node pool name} |
| 129 | + ``` |
| 130 | + The above parameters should match the parameters of the node pool you created. Currently, Huawei CCE only provides |
| 131 | + autoscaling against a single node pool. |
| 132 | + |
| 133 | + More configuration options can be added to the cluster autoscaler, such as `scale-down-delay-after-add`, `scale-down-unneeded-time`, etc. |
| 134 | + See available configuration options [here](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca). |
| 135 | +
|
| 136 | +#### Deploy cluster autoscaler on CCE |
| 137 | +
|
| 138 | +1. Log in to a machine which can manage the CCE cluster with `kubectl`. |
| 139 | +
|
| 140 | + Make sure the machine has kubectl access to the CCE cluster. We recommend using a worker node to manage the cluster. Follow |
| 141 | + the instructions for |
| 142 | +[Connecting to a Kubernetes Cluster Using kubectl](https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_01_0107.html) |
| 143 | +to set up kubectl access to CCE cluster if you cannot execute `kubectl` on your machine. |
| 144 | +
|
| 145 | +2. Create the Service Account: |
| 146 | + ``` |
| 147 | + kubectl create -f cluster-autoscaler-svcaccount.yaml |
| 148 | + ``` |
| 149 | +
|
| 150 | +3. Create the Secret: |
| 151 | + ``` |
| 152 | + kubectl create -f cluster-autoscaler-secret.yaml |
| 153 | + ``` |
| 154 | +
|
| 155 | +4. Create the cluster autoscaler deployment: |
| 156 | + ``` |
| 157 | + kubectl create -f cluster-autoscaler-deployment.yaml |
| 158 | + ``` |
| 159 | +
|
| 160 | +### Testing |
| 161 | +Now the cluster autoscaler should be successfully deployed on the cluster. Check it on the CCE UI console, or execute |
| 162 | +``` |
| 163 | +kubectl get pods -n kube-system |
| 164 | +``` |
| 165 | +
|
| 166 | +To see whether it functions correctly, deploy a Service to the cluster, and increase and decrease workload to the |
| 167 | +Service. Cluster autoscaler should be able to autoscale the node pool with `Autoscaler` on to accommodate the load. |
| 168 | +
|
| 169 | +A simple testing method is like this: |
| 170 | +- Create a Service: listening for http request |
| 171 | +
|
| 172 | +- Create HPA or AOM policy for pods to be autoscaled |
| 173 | + * AOM policy: To create an AOM policy, go into the deployment, click `Scaling` tag and click `Add Scaling Policy` |
| 174 | + button on Huawei Cloud UI. |
| 175 | + * HPA policy: There're two ways to create an HPA policy. |
| 176 | + * Follow this instruction to create an HPA policy through UI: |
| 177 | + [Scaling a Workload](https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_01_0208.html) |
| 178 | + * Install [metrics server](https://github.com/kubernetes-sigs/metrics-server) by yourself and create an HPA policy |
| 179 | + by executing something like this: |
| 180 | + ``` |
| 181 | + kubectl autoscale deployment [Deployment name] --cpu-percent=50 --min=1 --max=10 |
| 182 | + ``` |
| 183 | + The above command creates an HPA policy on the deployment with target average cpu usage of 50%. The number of |
| 184 | + pods will grow if average cpu usage is above 50%, and will shrink otherwise. The `min` and `max` parameters set |
| 185 | + the minimum and maximum number of pods of this deployment. |
| 186 | +- Generate load to the above service |
| 187 | +
|
| 188 | + Example tools for generating workload to an http service are: |
| 189 | + * [Use `hey` command](https://github.com/rakyll/hey) |
| 190 | + * Use `busybox` image: |
| 191 | + ``` |
| 192 | + kubectl run --generator=run-pod/v1 -it --rm load-generator --image=busybox /bin/sh |
| 193 | + |
| 194 | + # send an infinite loop of queries to the service |
| 195 | + while true; do wget -q -O- {Service access address}; done |
| 196 | + ``` |
| 197 | + |
| 198 | + Feel free to use other tools which have a similar function. |
| 199 | + |
| 200 | +- Wait for pods to be added: as load increases, more pods will be added by HPA or AOM |
| 201 | +
|
| 202 | +- Wait for nodes to be added: when there's insufficient resource for additional pods, new nodes will be added to the |
| 203 | +cluster by the cluster autoscaler |
| 204 | +
|
| 205 | +- Stop the load |
| 206 | +
|
| 207 | +- Wait for pods to be removed: as load decreases, pods will be removed by HPA or AOM |
| 208 | +
|
| 209 | +- Wait for nodes to be removed: as pods being removed from nodes, several nodes will become underutilized or empty, |
| 210 | +and will be removed by the cluster autoscaler |
| 211 | +
|
| 212 | +
|
| 213 | +## Notes |
| 214 | +
|
| 215 | +1. Huawei Cloud CCE cluster does not yet support autoscaling against multiple node pools within a single cluster, but |
| 216 | +this is currently in development. For now, make sure that there's only one node pool with `Autoscaler` label |
| 217 | +on in the CCE cluster. |
| 218 | +2. If the version of the CCE cluster is v1.15.6 or older, log statements similar to the following may be present in the |
| 219 | +autoscaler pod logs: |
| 220 | + ``` |
| 221 | + E0402 13:25:05.472999 1 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSINode: the server could not find the requested resource |
| 222 | + ``` |
| 223 | + This is normal and will be fixed by a future version of CCE. |
| 224 | + |
| 225 | +## Support & Contact Info |
| 226 | +
|
| 227 | +Interested in Cluster Autoscaler on Huawei Cloud? Want to talk? Have questions, concerns or great ideas? |
| 228 | +
|
| 229 | +Please reach out to us at `[email protected]`. |
0 commit comments