Skip to content

Commit 05fd28d

Browse files
committed
addressing SME feedback
1 parent f8b9aee commit 05fd28d

File tree

1 file changed

+15
-7
lines changed

1 file changed

+15
-7
lines changed

modules/configuring-metric-based-autoscaling.adoc

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,15 @@ To set up autoscaling for your inference service in standard deployments, you mu
3333
----
3434
kind: InferenceService
3535
metadata:
36-
# ...
36+
name: my-inference-service
37+
namespace: my-namespace
3738
annotations:
38-
# ...
3939
serving.kserve.io/autoscalerClass: keda
4040
spec:
4141
predictor:
42-
# …
4342
minReplicas: 1
4443
maxReplicas: 5
45-
autoscaling:
44+
autoScaling:
4645
metrics:
4746
- type: External
4847
external:
@@ -51,9 +50,8 @@ spec:
5150
serverAddress: "https://thanos-querier.openshift-monitoring.svc:9092"
5251
query: vllm:num_requests_waiting
5352
authenticationRef:
54-
authModes: bearer
55-
authenticationRef:
56-
name: inference-prometheus-auth
53+
name: inference-prometheus-auth
54+
authModes: bearer
5755
target:
5856
type: Value
5957
value: 2
@@ -62,5 +60,15 @@ spec:
6260
The example configuration sets up the inference service to autoscale between 1 and 5 replicas based on the number of requests waiting to be processed, as indicated by the `vllm:num_requests_waiting` metric.
6361
. Click *Save*.
6462

63+
.Verification
64+
65+
* Confirm that the KEDA `ScaledObject` resource is created and that the &minReplicaCount*, *maxReplicacount* and *Target* values match your configuration:
66+
+
67+
[source, console]
68+
----
69+
oc get scaledobject -n <namespace>
70+
oc describe scaledobject <scaledobject-name> -n <namespace>
71+
----
72+
* Check
6573
//[role="_additional-resources"]
6674
//.Additional resources

0 commit comments

Comments
 (0)