Skip to content

Commit e881087

Browse files
committed
Added comments in HPAs for scale up/down behavior
1 parent 66a2615 commit e881087

File tree

2 files changed

+24
-0
lines changed

2 files changed

+24
-0
lines changed

ai/vllm-deployment/hpa/gpu-horizontal-pod-autoscaler.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,14 @@ spec:
3232
averageValue: 20
3333
behavior:
3434
scaleUp:
35+
# The stabilizationWindowSeconds is set to 0 to allow for immediate
36+
# scaling up. This is a trade-off:
37+
# - For highly volatile workloads, immediate scaling is critical to
38+
# maintain performance and responsiveness.
39+
# - However, this also introduces a risk of over-scaling if the workload
40+
# spikes are very brief. A non-zero value would make the scaling
41+
# less sensitive to short-lived spikes, but could introduce latency
42+
# if the load persists.
3543
stabilizationWindowSeconds: 0
3644
policies:
3745
- type: Pods
@@ -42,6 +50,14 @@ spec:
4250
periodSeconds: 15
4351
selectPolicy: Max
4452
scaleDown:
53+
# The stabilizationWindowSeconds is set to 30 to prevent the HPA from
54+
# scaling down too aggressively. This means the controller will wait for
55+
# 30 seconds after a scale-down event before considering another one.
56+
# This helps to smooth out the scaling behavior and prevent "flapping"
57+
# (rapidly scaling up and down). A larger value will make the scaling
58+
# more conservative, which can be useful for workloads with fluctuating
59+
# metrics, but it may also result in higher costs if the resources are
60+
# not released quickly after a load decrease.
4561
stabilizationWindowSeconds: 30
4662
policies:
4763
- type: Percent

ai/vllm-deployment/hpa/horizontal-pod-autoscaler.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,14 @@ spec:
3434
# The scaling behavior can be customized to control how quickly the
3535
# deployment scales up or down.
3636
scaleDown:
37+
# The stabilizationWindowSeconds is set to 30 to prevent the HPA from
38+
# scaling down too aggressively. This means the controller will wait for
39+
# 30 seconds after a scale-down event before considering another one.
40+
# This helps to smooth out the scaling behavior and prevent "flapping"
41+
# (rapidly scaling up and down). A larger value will make the scaling
42+
# more conservative, which can be useful for workloads with fluctuating
43+
# metrics, but it may also result in higher costs if the resources are
44+
# not released quickly after a load decrease.
3745
stabilizationWindowSeconds: 30
3846
policies:
3947
- type: Percent

0 commit comments

Comments
 (0)