Skip to content

Commit 8135baa

Browse files
Merge pull request #231182 from MGoedtel/task70791
Metrics Server VPA article
2 parents 61cd064 + f2d18d2 commit 8135baa

File tree

3 files changed

+209
-48
lines changed

3 files changed

+209
-48
lines changed

articles/aks/TOC.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -199,8 +199,10 @@
199199
href: aks-diagnostics.md
200200
- name: Integrate ACR with an AKS cluster
201201
href: cluster-container-registry-integration.md
202-
- name: Use Vertical Pod Autoscaler (preview)
202+
- name: Use Vertical Pod Autoscaler
203203
href: vertical-pod-autoscaler.md
204+
- name: Metrics Server VPA Throttling
205+
href: use-metrics-server-vertical-pod-autoscaler.md
204206
- name: Scale an AKS cluster
205207
href: scale-cluster.md
206208
- name: Stop/Deallocate nodes with Scale-down Mode
@@ -349,15 +351,15 @@
349351
items:
350352
- name: Create an OIDC Issuer for your cluster
351353
href: use-oidc-issuer.md
352-
- name: Workload identity (preview)
354+
- name: Workload identity
353355
items:
354356
- name: Overview
355357
href: workload-identity-overview.md
356358
- name: Deploy and configure cluster
357359
href: workload-identity-deploy-cluster.md
358360
- name: Modernize your app with workload identity
359361
href: workload-identity-migrate-from-pod-identity.md
360-
- name: Use Azure AD pod identity (preview)
362+
- name: Use Azure AD pod identity
361363
href: use-azure-ad-pod-identity.md
362364
- name: Use Pod Sandboxing
363365
href: use-pod-sandboxing.md
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
title: Configure Metrics Server VPA in Azure Kubernetes Service (AKS)
3+
description: Learn how to vertically autoscale your Metrics Server pods on an Azure Kubernetes Service (AKS) cluster.
4+
ms.topic: article
5+
ms.date: 03/21/2023
6+
---
7+
8+
# Configure Metrics Server VPA in Azure Kubernetes Service (AKS)
9+
10+
[Metrics Server][metrics-server-overview] is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. With Azure Kubernetes Service (AKS), vertical pod autoscaling is enabled for the Metrics Server. The Metrics Server is commonly used by other Kubernetes add ons, such as the [Horizontal Pod Autoscaler][horizontal-pod-autoscaler].
11+
12+
Vertical Pod Autoscaler (VPA) enables you to adjust the resource limit when the Metrics Server is experiencing consistent CPU and memory resource constraints.
13+
14+
## Before you begin
15+
16+
AKS cluster is running Kubernetes version 1.24 and higher.
17+
18+
## Metrics server throttling
19+
20+
If the Metrics Server throttling rate is high, and the memory usage of its two pods is unbalanced, this indicates the Metrics Server requires more resources than the default values specified.
21+
22+
To update the coefficient values, create a ConfigMap in the overlay *kube-system* namespace to override the values in the Metrics Server specification. Perform the following steps to update the metrics server.
23+
24+
1. Create a ConfigMap file named *metrics-server-config.yaml* and copy in the following manifest.
25+
26+
```yml
27+
apiVersion: v1
28+
kind: ConfigMap
29+
metadata:
30+
name: metrics-server-config
31+
namespace: kube-system
32+
labels:
33+
kubernetes.io/cluster-service: "true"
34+
addonmanager.kubernetes.io/mode: EnsureExists
35+
data:
36+
NannyConfiguration: |-
37+
apiVersion: nannyconfig/v1alpha1
38+
kind: NannyConfiguration
39+
baseCPU: 100m
40+
cpuPerNode: 1m
41+
baseMemory: 100Mi
42+
memoryPerNode: 8Mi
43+
```
44+
45+
In the ConfigMap example, the resource limit and request are changed to the following:
46+
47+
* cpu: (100+1n) millicore
48+
* memory: (100+8n) mebibyte
49+
50+
Where *n* is the number of nodes.
51+
52+
2. Create the ConfigMap using the [kubectl apply][kubectl-apply] command and specify the name of your YAML manifest:
53+
54+
```bash
55+
kubectl apply -f metrics-server-config.yaml
56+
```
57+
58+
3. Restart the Metrics Server pods. There are two Metrics server pods, and the following command deletes all of them.
59+
60+
```bash
61+
kubectl -n kube-system delete po metrics-server-pod-name
62+
```
63+
64+
4. To verify the updated resources took effect, run the following command to review the Metrics Server VPA log.
65+
66+
```bash
67+
kubectl -n kube-system logs metrics-server-pod-name -c metrics-server-vpa
68+
```
69+
70+
The following example output resembles the results showing the updated throttling settings were applied.
71+
72+
```output
73+
ERROR: logging before flag.Parse: I0315 23:12:33.956112 1 pod_nanny.go:68] Invoked by [/pod_nanny --config-dir=/etc/config --cpu=44m --extra-cpu=0.5m --memory=51Mi --extra-memory=4Mi --poll-period=180000 --threshold=5 --deployment=metrics-server --container=metrics-server]
74+
ERROR: logging before flag.Parse: I0315 23:12:33.956159 1 pod_nanny.go:69] Version: 1.8.14
75+
ERROR: logging before flag.Parse: I0315 23:12:33.956171 1 pod_nanny.go:85] Watching namespace: kube-system, pod: metrics-server-545d8b77b7-5nqq9, container: metrics-server.
76+
ERROR: logging before flag.Parse: I0315 23:12:33.956175 1 pod_nanny.go:86] storage: MISSING, extra_storage: 0Gi
77+
ERROR: logging before flag.Parse: I0315 23:12:33.957441 1 pod_nanny.go:116] cpu: 100m, extra_cpu: 1m, memory: 100Mi, extra_memory: 8Mi
78+
ERROR: logging before flag.Parse: I0315 23:12:33.957456 1 pod_nanny.go:145] Resources: [{Base:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} ExtraPerNode:{i:{value:0 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:104857600 scale:0} d:{Dec:<nil>} s:100Mi Format:BinarySI} ExtraPerNode:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} Name:memory
79+
```
80+
81+
Be cautious of the *baseCPU*, *cpuPerNode*, *baseMemory*, and the *memoryPerNode*, because the ConfigMap isn't validated by AKS. As a recommended practice, increase the value gradually to avoid unnecessary resource consumption. Proactively monitor resource usage when updating or creating the ConfigMap. A large number of resource requests could negatively impact the node.
82+
83+
## Manually configure Metrics Server resource usage
84+
85+
The Metrics Server VPA adjusts resource usage by the number of nodes. If the cluster scales up or down often, the Metrics Server might restart frequently. In this case, you can bypass VPA and manually control its resource usage. This method to configure VPA isn't to be performed in addition to the steps described in the previous section.
86+
87+
If you would like to bypass VPA for Metrics Server and manually control its resource usage, perform the following steps.
88+
89+
1. Create a ConfigMap file named *metrics-server-config.yaml* and copy in the following manifest.
90+
91+
```yml
92+
apiVersion: v1
93+
kind: ConfigMap
94+
metadata:
95+
name: metrics-server-config
96+
namespace: kube-system
97+
labels:
98+
kubernetes.io/cluster-service: "true"
99+
addonmanager.kubernetes.io/mode: EnsureExists
100+
data:
101+
NannyConfiguration: |-
102+
apiVersion: nannyconfig/v1alpha1
103+
kind: NannyConfiguration
104+
baseCPU: 100m
105+
cpuPerNode: 0m
106+
baseMemory: 100Mi
107+
memoryPerNode: 0Mi
108+
```
109+
110+
In this ConfigMap example, it changes the resource limit and request to the following:
111+
112+
* cpu: 100 millicore
113+
* memory: 100 mebibyte
114+
115+
Changing the number of nodes doesn't trigger autoscaling.
116+
117+
2. Create the ConfigMap using the [kubectl apply][kubectl-apply] command and specify the name of your YAML manifest:
118+
119+
```yml
120+
kubectl apply -f metrics-server-config.yaml
121+
```
122+
123+
3. Restart the Metrics Server pods. There are two Metrics server pods, and the following command deletes all of them.
124+
125+
```bash
126+
kubectl -n kube-system delete po metrics-server-pod-name
127+
```
128+
129+
4. To verify the updated resources took affect, run the following command to review the Metrics Server VPA log.
130+
131+
```bash
132+
kubectl -n kube-system logs metrics-server-pod-name -c metrics-server-vpa
133+
```
134+
135+
The following example output resembles the results showing the updated throttling settings were applied.
136+
137+
```output
138+
ERROR: logging before flag.Parse: I0315 23:12:33.956112 1 pod_nanny.go:68] Invoked by [/pod_nanny --config-dir=/etc/config --cpu=44m
139+
--extra-cpu=0.5m --memory=51Mi --extra-memory=4Mi --poll-period=180000 --threshold=5 --deployment=metrics-server --container=metrics-server]
140+
ERROR: logging before flag.Parse: I0315 23:12:33.956159 1 pod_nanny.go:69] Version: 1.8.14
141+
ERROR: logging before flag.Parse: I0315 23:12:33.956171 1 pod_nanny.go:85] Watching namespace: kube-system, pod: metrics-server-545d8b77b7-5nqq9, container: metrics-server.
142+
ERROR: logging before flag.Parse: I0315 23:12:33.956175 1 pod_nanny.go:86] storage: MISSING, extra_storage: 0Gi
143+
ERROR: logging before flag.Parse: I0315 23:12:33.957441 1 pod_nanny.go:116] cpu: 100m, extra_cpu: 0m, memory: 100Mi, extra_memory: 0Mi
144+
ERROR: logging before flag.Parse: I0315 23:12:33.957456 1 pod_nanny.go:145] Resources: [{Base:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} ExtraPerNode:{i:{value:0 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:104857600 scale:0} d:{Dec:<nil>} s:100Mi Format:BinarySI}
145+
ExtraPerNode:{i:{value:0 scale:0} d:{Dec:<nil>} s: Format:BinarySI} Name:memory}]
146+
```
147+
148+
## Troubleshooting
149+
150+
1. If you use the following configmap, the Metrics Server VPA customizations aren't applied. You need add a unit for `baseCPU`.
151+
152+
```yml
153+
apiVersion: v1
154+
kind: ConfigMap
155+
metadata:
156+
name: metrics-server-config
157+
namespace: kube-system
158+
labels:
159+
kubernetes.io/cluster-service: "true"
160+
addonmanager.kubernetes.io/mode: EnsureExists
161+
data:
162+
NannyConfiguration: |-
163+
apiVersion: nannyconfig/v1alpha1
164+
kind: NannyConfiguration
165+
baseCPU: 100
166+
cpuPerNode: 1m
167+
baseMemory: 100Mi
168+
memoryPerNode: 8Mi
169+
```
170+
171+
The following example output resembles the results showing the updated throttling settings aren't applied.
172+
173+
```output
174+
ERROR: logging before flag.Parse: I0316 23:32:08.383389 1 pod_nanny.go:68] Invoked by [/pod_nanny --config-dir=/etc/config --cpu=44m
175+
--extra-cpu=0.5m --memory=51Mi --extra-memory=4Mi --poll-period=180000 --threshold=5 --deployment=metrics-server --container=metrics-server]
176+
ERROR: logging before flag.Parse: I0316 23:32:08.383430 1 pod_nanny.go:69] Version: 1.8.14
177+
ERROR: logging before flag.Parse: I0316 23:32:08.383441 1 pod_nanny.go:85] Watching namespace: kube-system, pod: metrics-server-7d78876589-hcrff, container: metrics-server.
178+
ERROR: logging before flag.Parse: I0316 23:32:08.383446 1 pod_nanny.go:86] storage: MISSING, extra_storage: 0Gi
179+
ERROR: logging before flag.Parse: I0316 23:32:08.384554 1 pod_nanny.go:192] Unable to decode Nanny Configuration from config map, using default parameters
180+
ERROR: logging before flag.Parse: I0316 23:32:08.384565 1 pod_nanny.go:116] cpu: 44m, extra_cpu: 0.5m, memory: 51Mi, extra_memory: 4Mi
181+
ERROR: logging before flag.Parse: I0316 23:32:08.384589 1 pod_nanny.go:145] Resources: [{Base:{i:{value:44 scale:-3} d:{Dec:<nil>} s:44m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:53477376 scale:0} d:{Dec:<nil>} s:51Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0}
182+
d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
183+
```
184+
185+
2. For Kubernetes version 1.23 and higher clusters, Metrics Server has a *PodDisruptionBudget*. It ensures the number of available Metrics Server pods is at least one. If you get something like this after running `kubectl -n kube-system get po`, it's possible that the customized resource usage is small. Increase the coefficient values to resolve it.
186+
187+
```output
188+
metrics-server-679b886d4-pxwdf 1/2 CrashLoopBackOff 6 (36s ago) 6m33s
189+
metrics-server-679b886d4-svxxx 1/2 CrashLoopBackOff 6 (54s ago) 6m33s
190+
metrics-server-7d78876589-hcrff 2/2 Running 0 37m
191+
```
192+
193+
## Next steps
194+
195+
Metrics Server is a component in the core metrics pipeline. For more information see, [Metrics Server API design][metrics-server-api-design].
196+
197+
<!-- EXTERNAL LINKS -->
198+
[kubectl-apply]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#apply
199+
[metrics-server-overview]: https://kubernetes-sigs.github.io/metrics-server/
200+
[metrics-server-api-design]: https://github.com/kubernetes/design-proposals-archive/blob/main/instrumentation/resource-metrics-api.md
201+
202+
<!--- INTERNAL LINKS --->
203+
[horizontal-pod-autoscaler]: concepts-scale.md#horizontal-pod-autoscaler

articles/aks/vertical-pod-autoscaler.md

Lines changed: 1 addition & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Vertical Pod Autoscaling (preview) in Azure Kubernetes Service (AKS)
33
description: Learn how to vertically autoscale your pod on an Azure Kubernetes Service (AKS) cluster.
44
ms.topic: article
55
ms.custom: devx-track-azurecli
6-
ms.date: 01/12/2023
6+
ms.date: 03/17/2023
77
---
88

99
# Vertical Pod Autoscaling (preview) in Azure Kubernetes Service (AKS)
@@ -394,50 +394,6 @@ Vertical Pod autoscaling uses the `VerticalPodAutoscaler` object to automaticall
394394

395395
The Vertical Pod Autoscaler uses the `lowerBound` and `upperBound` attributes to decide whether to delete a Pod and replace it with a new Pod. If a Pod has requests less than the lower bound or greater than the upper bound, the Vertical Pod Autoscaler deletes the Pod and replaces it with a Pod that meets the target attribute.
396396

397-
## Metrics server VPA throttling
398-
399-
With AKS clusters version 1.24 and higher, vertical pod autoscaling is enabled for the metrics server. VPA enables you to adjust the resource limit when the metrics server is experiencing consistent CPU and memory resource constraints.
400-
401-
If the metrics server throttling rate is high and the memory usage of its two pods are unbalanced, this indicates the metrics server requires more resources than the default values specified.
402-
403-
To update the coefficient values, create a ConfigMap in the overlay *kube-system* namespace to override the values in the metrics server specification. Perform the following steps to update the metrics server.
404-
405-
1. Create a ConfigMap file named *metrics-server-config.yaml* and copy in the following manifest.
406-
407-
```yml
408-
apiVersion: v1
409-
kind: ConfigMap
410-
metadata:
411-
name: metrics-server-config
412-
namespace: kube-system
413-
labels:
414-
kubernetes.io/cluster-service: "true"
415-
addonmanager.kubernetes.io/mode: EnsureExists
416-
data:
417-
NannyConfiguration: |-
418-
apiVersion: nannyconfig/v1alpha1
419-
kind: NannyConfiguration
420-
baseCPU: 100m
421-
cpuPerNode: 1m
422-
baseMemory: 100Mi
423-
memoryPerNode: 8Mi
424-
```
425-
426-
In this ConfigMap example, it changes the resource limit and request to the following:
427-
428-
* cpu: (100+1n) millicore
429-
* memory: (100+8n) mebibyte
430-
431-
Where *n* is the number of nodes.
432-
433-
2. Create the ConfigMap using the [kubectl apply][kubectl-apply] command and specify the name of your YAML manifest:
434-
435-
```bash
436-
kubectl apply -f metrics-server-config.yaml
437-
```
438-
439-
Be cautious of the *baseCPU*, *cpuPerNode*, *baseMemory*, and the *memoryPerNode* as the ConfigMap won't be validated by AKS. As a recommended practice, increase the value gradually to avoid unnecessary resource consumption. Proactively monitor resource usage when updating or creating the ConfigMap. A large number of resource requests could negatively impact the node.
440-
441397
## Next steps
442398

443399
This article showed you how to automatically scale resource utilization, such as CPU and memory, of cluster nodes to match application requirements. You can also use the horizontal pod autoscaler to automatically adjust the number of pods that run your application. For steps on using the horizontal pod autoscaler, see [Scale applications in AKS][scale-applications-in-aks].

0 commit comments

Comments
 (0)