feat: add HPA-based pod autoscaling to okd deployments#8798
feat: add HPA-based pod autoscaling to okd deployments#8798skettkepalli wants to merge 1 commit intoEclipseFdn:mainfrom
Conversation
|
can we enable scaling on number of requests or open connections? what we have seen is that cpu is under utilized while connections are rejected, as open connections could not be completed as they are being blocked on IO. |
|
memory is not a good indication to scale, the pods are usually at max memory. |
|
My 2 cents: CPU under utilized and threads starving for connections are a clear indication that we don't need to scale up and we should fine tune thread pool and connection pool instead. Spinning up more replicas will do more harm than good in this case. |
Scaling based on thread count or connection metrics is technically possible, but it requires using KEDA (Kubernetes Event-Driven Autoscaling) since the standard Horizontal Pod Autoscaler (HPA) only supports scaling based on CPU and memory metrics by default. At the moment there are a couple of open questions: Whether the Custom Metrics Autoscaler / KEDA operator is installed in our OKD cluster. Whether KEDA can query our existing remote Prometheus endpoint to retrieve the required metrics. Additionally, KEDA uses its own resource type (ScaledObject) to define autoscaling behavior. This means it cannot reuse a standalone HPA directly; instead the scaling configuration would need to be defined through a KEDA ScaledObject, which internally manages the HPA. |
|
indeed, as I was mentioning we need to have a better understanding what values for connections are good for our setup. However we will need to have some way of scaling on top of that as well, maybe more conservative. |
|
a link from Fred: https://kubernetes.io/docs/concepts/workloads/autoscaling/horizontal-pod-autoscale/#scaling-on-custom-metrics wrt custom metrics for scaling. |
Summary of Changes:
Autoscaling behaviour
Scale-up: Reacts immediately doubling fleet size every 15s to absorb traffic spikes quickly.
Scale-down: Conservative — uses a scaleDownWindow stabilisation window and removes at most 10% of pods per minute, preventing flapping on brief traffic dips.
Triggers: CPU ≥ 70% or memory ≥ 80% average utilization across pods.