Skip to content

Commit 054ec91

Browse files
author
Michael Burke
committed
GA support for Custom Metric Autoscaler
1 parent 2bb3e4e commit 054ec91

9 files changed

+519
-99
lines changed

modules/nodes-pods-autoscaling-custom-about.adoc

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,21 @@
66
[id="nodes-pods-autoscaling-custom-about_{context}"]
77
= Understanding the custom metrics autoscaler
88

9-
The custom metrics autoscaler uses the Kubernetes-based Event Driven Autoscaler (KEDA) and is built on top of the {product-title} horizontal pod autoscaler (HPA).
9+
The Custom Metrics Autoscaler Operator scales your pods up and down based on custom, external metrics from specific applications. Your other applications continue to use other scaling methods. You configure _triggers_, also known as scalers, which are the source of events and metrics that the custom metrics autoscaler uses to determine how to scale. The custom metrics autoscaler uses a metrics API to convert the external metrics to a form that {product-title} can use. The custom metrics autoscaler creates a horizontal pod autoscaler (HPA) that performs the actual scaling.
1010

11-
The Custom Metrics Autoscaler Operator scales your pods up and down based on custom, external metrics from specific applications. Your other applications continue to use other scaling methods. You configure _triggers_, also known as scalers, which are the source of events and metrics that the custom metrics autoscaler uses to determine how to scale. The custom metrics autoscaler uses a metrics API to convert the external metrics to a form that {product-title} can use. The custom metrics autoscaler creates a horizontal pod autoscaler (HPA) that performs the actual scaling. The custom metrics autoscaler currently supports only the Prometheus trigger, which can use the installed {product-title} monitoring or an external Prometheus server as the metrics source.
12-
13-
To use the custom metrics autoscaler, you create a `ScaledObject` or `ScaledJob` object, which defines the scaling metadata. You specify the deployment or job to scale, the source of the metrics to scale on (trigger), and other parameters such as the minimum and maximum replica counts allowed.
11+
To use the custom metrics autoscaler, you create a `ScaledObject` or `ScaledJob` object, which is a custom resource (CR) that defines the scaling metadata. You specify the deployment or job to scale, the source of the metrics to scale on (trigger), and other parameters such as the minimum and maximum replica counts allowed.
1412

1513
[NOTE]
1614
====
1715
You can create only one scaled object or scaled job for each workload that you want to scale. Also, you cannot use a scaled object or scaled job and the horizontal pod autoscaler (HPA) on the same workload.
1816
====
1917

18+
The custom metrics autoscaler, unlike the HPA, can scale to zero. If you set the `minReplicaCount` value in the custom metrics autoscaler CR to `0`, the custom metrics autoscaler scales the workload down from 1 to 0 replicas to or up from 0 replicas to 1. This is known as the _activation phase_. After scaling up to 1 replica, the HPA takes control of the scaling. This is known as the _scaling phase_.
19+
20+
Some triggers allow you to change the number of replicas that are scaled by the cluster metrics autoscaler. In all cases, the parameter to configure the activation phase always uses the same phrase, prefixed with _activation_. For example, if the `threshold` parameter configures scaling, `activationThreshold` would configure activation. Configuring the activation and scaling phases allows you more flexibility with your scaling policies. For example, you could configure a higher activation phase to prevent scaling up or down if the metric is particularly low.
21+
22+
The activation value has more priority than the scaling value in case of different decisions for each. For example, if the `threshold` is set to `10`, and the `activationThreshold` is `50`, if the metric reports `40`, the scaler is not active and the pods are scaled to zero even if the HPA requires 4 instances.
23+
2024
////
2125
[NOTE]
2226
====
@@ -35,3 +39,5 @@ Successfully set ScaleTarget replica count
3539
----
3640
Successfully updated ScaleTarget
3741
----
42+
43+
You can temporarily pause the autoscaling of a workload object, if needed. For example, you could pause autoscaling before performing cluster maintenance.
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * nodes/nodes-pods-autoscaling-custom.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="nodes-pods-autoscaling-custom-audit_{context}"]
7+
= Configuring audit logging
8+
9+
// Text borrowed from gathering-cluster-data.adoc. Make into snippet?
10+
11+
You can gather audit logs, which are a security-relevant chronological set of records documenting the sequence of activities that have affected the system by individual users, administrators, or other components of the system.
12+
13+
For example, audit logs can help you understand where an autoscaling request is coming from. This is key information when backends are getting overloaded by autoscaling requests made by user applications and you need to determine which is the troublesome application. You can configure auditing for the Custom Metrics Autoscaler Operator by editing the `KedaController` custom resource. The logs are sent to an audit log file on a volume that is secured by using a persistent volume claim in the `KedaController` CR.
14+
15+
// You can view the audit log file directly or use the `oc adm must-gather` CLI. The `oc adm must-gather` CLI collects the log along with other information from your cluster that is most likely needed for debugging issues, such as resource definitions and service logs.
16+
17+
.Prerequisites
18+
19+
* The Custom Metrics Autoscaler Operator must be installed.
20+
21+
.Procedure
22+
23+
. Edit the `KedaController` custom resource to add the `auditConfig` stanza:
24+
+
25+
[source,yaml]
26+
----
27+
kind: KedaController
28+
apiVersion: keda.sh/v1alpha1
29+
metadata:
30+
name: keda
31+
namespace: openshift-keda
32+
spec:
33+
...
34+
metricsServer:
35+
...
36+
auditConfig:
37+
logFormat: "json" <1>
38+
logOutputVolumeClaim: "pvc-audit-log" <2>
39+
policy:
40+
rules: <3>
41+
- level: Metadata
42+
omitStages: "RequestReceived" <4>
43+
omitManagedFields: false <5>
44+
lifetime: <6>
45+
maxAge: "2"
46+
maxBackup: "1"
47+
maxSize: "50"
48+
----
49+
<1> Specifies the output format of the audit log, either `legacy` or `json`.
50+
<2> Specifies an existing persistent volume claim for storing the log data. All requests coming to the API server are logged to this persistent volume claim. If you leave this field empty, the log data is sent to stdout.
51+
<3> Specifies which events should be recorded and what data they should include:
52+
+
53+
* `None`: Do not log events.
54+
* `Metadata`: Log only the metadata for the request, such as user, timestamp, and so forth. Do not log the request text and the response text. This is the default.
55+
* `Request`: Log only the metadata and the request text but not the response text. This option does not apply for non-resource requests.
56+
* `RequestResponse`: Log event metadata, request text, and response text. This option does not apply for non-resource requests.
57+
+
58+
<4> Specifies stages for which no event is created.
59+
<5> Specifies whether to omit the managed fields of the request and response bodies from being written to the API audit log, either `true` to omit the fields or `false` to include the fields.
60+
<6> Specifies the size and lifespan of the audit logs.
61+
+
62+
* `maxAge`: The maximum number of days to retain audit log files, based on the timestamp encoded in their filename.
63+
* `maxBackup`: The maximum number of audit log files to retain. Set to `0` to retain all audit log files.
64+
* `maxSize`: The maximum size in megabytes of an audit log file before it gets rotated.
65+
66+
.Verification
67+
68+
////
69+
. Use the `oc adm must-gather` CLI to collect the audit log file:
70+
+
71+
[source,terminal]
72+
----
73+
oc adm must-gather -- /usr/bin/gather_audit_logs
74+
----
75+
////
76+
77+
. View the audit log file directly:
78+
79+
.. Obtain the name of the `keda-metrics-apiserver-*` pod:
80+
+
81+
[source,terminal]
82+
----
83+
oc get pod -n openshift-keda
84+
----
85+
+
86+
.Example output
87+
+
88+
[source,terminal]
89+
----
90+
NAME READY STATUS RESTARTS AGE
91+
custom-metrics-autoscaler-operator-5cb44cd75d-9v4lv 1/1 Running 0 8m20s
92+
keda-metrics-apiserver-65c7cc44fd-rrl4r 1/1 Running 0 2m55s
93+
keda-operator-776cbb6768-zpj5b 1/1 Running 0 2m55s
94+
----
95+
96+
.. View the log data by using a command similar to the following:
97+
+
98+
[source,terminal]
99+
----
100+
$ oc logs keda-metrics-apiserver-<hash>|grep -i metadata <1>
101+
----
102+
<1> Optional: You can use the `grep` command to specify the log level to display: `Metadata`, `Request`, `RequestResponse`.
103+
+
104+
For example:
105+
+
106+
[source,terminal]
107+
----
108+
$ oc logs keda-metrics-apiserver-65c7cc44fd-rrl4r|grep -i metadata
109+
----
110+
+
111+
.Example output
112+
+
113+
[source,terminal]
114+
----
115+
...
116+
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"4c81d41b-3dab-4675-90ce-20b87ce24013","stage":"ResponseComplete","requestURI":"/healthz","verb":"get","user":{"username":"system:anonymous","groups":["system:unauthenticated"]},"sourceIPs":["10.131.0.1"],"userAgent":"kube-probe/1.26","responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2023-02-16T13:00:03.554567Z","stageTimestamp":"2023-02-16T13:00:03.555032Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}}
117+
...
118+
----
119+
120+
. Alternatively, you can view a specific log:
121+
+
122+
.. Use a command similar to the following to log into the `keda-metrics-apiserver-*` pod:
123+
+
124+
[source,terminal]
125+
----
126+
$ oc rsh pod/keda-metrics-apiserver-<hash> -n openshift-keda
127+
----
128+
+
129+
For example:
130+
+
131+
[source,terminal]
132+
----
133+
$ oc rsh pod/keda-metrics-apiserver-65c7cc44fd-rrl4r -n openshift-keda
134+
----
135+
136+
.. Change to the `/var/audit-policy/` directory:
137+
+
138+
[source,terminal]
139+
----
140+
sh-4.4$ cd /var/audit-policy/
141+
----
142+
143+
.. List the available logs:
144+
+
145+
[source,terminal]
146+
----
147+
sh-4.4$ ls
148+
----
149+
+
150+
.Example output
151+
+
152+
[source,terminal]
153+
----
154+
log-2023.02.17-14:50 policy.yaml
155+
----
156+
157+
.. View the log, as needed:
158+
+
159+
[source,terminal]
160+
----
161+
sh-4.4$ cat <log_name>/<pvc_name>|grep -i <log_level> <1>
162+
----
163+
<1> Optional: You can use the `grep` command to specify the log level to display: `Metadata`, `Request`, `RequestResponse`.
164+
+
165+
For example:
166+
+
167+
[source,terminal]
168+
----
169+
sh-4.4$ cat log-2023.02.17-14:50/pvc-audit-log|grep -i Request
170+
----
171+
+
172+
.Example output
173+
----
174+
...
175+
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"63e7f68c-04ec-4f4d-8749-bf1656572a41","stage":"ResponseComplete","requestURI":"/openapi/v2","verb":"get","user":{"username":"system:aggregator","groups":["system:authenticated"]},"sourceIPs":["10.128.0.1"],"responseStatus":{"metadata":{},"code":304},"requestReceivedTimestamp":"2023-02-17T13:12:55.035478Z","stageTimestamp":"2023-02-17T13:12:55.038346Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:discovery\" of ClusterRole \"system:discovery\" to Group \"system:authenticated\""}}
176+
...
177+
----

0 commit comments

Comments
 (0)