helm: add optional VerticalPodAutoscaler for the operator Deployment#12565
helm: add optional VerticalPodAutoscaler for the operator Deployment#12565QuentinBisson wants to merge 1 commit intostrimzi:mainfrom
Conversation
The cluster operator has fixed resource requests and limits, but the
optimal values vary significantly depending on the number of Kafka
clusters and topics being managed. Without VPA, operators must tune
resources manually and reactively.
Add an opt-in VPA resource (disabled by default) that targets the
strimzi-cluster-operator Deployment. When enabled, VPA recommends and
optionally applies CPU/memory adjustments automatically.
Configuration:
verticalPodAutoscaler:
enabled: true # requires VPA CRDs on the cluster
updateMode: "Auto" # Auto | Recreate | Initial | Off
Disabled by default to avoid requiring VPA CRDs in all environments.
Signed-off-by: QuentinBisson <quentin@giantswarm.io>
scholzj
left a comment
There was a problem hiding this comment.
Thanks for the PR. However - as mentioned in the previous PR(s) - before spamming us with 7 PRs for the same thing, you should ideally first talk with us about whether the things are actually considered useful and if we are interested in them. You should also check whether they should be opened as separate PRs or not.
Also, please make sure to use the PR template we use! And make sure to update the Helm Chart's README.md file with the new options!
I'm not sure this should be included. Vertical Pod Autoscaling of the Strimzi operator requires proper knowledge and experience of the operator. If you need it, I think you should add it yourself.
|
The concern about operator-specific VPA knowledge is valid, which is exactly why I propose shipping this disabled by default ( Happy to update the PR with clearer documentation around the risks if that helps. |
|
Few things to this, I didn't use Vertical Pod Autoscaler that much, however I'm not sure we want to have something like this in the Helm charts. I think it also creates another path which is "supported" by us but not maintained and tested. If someone has such desire and need, they can create it without need of having it in the Helm chart. We are not using this in the regular YAML manifests and I think that our Helm charts just follow what we have there. So that's two things actually that would be - from my side - against adding it into the Helm charts.
Okay I take Kafka clusters, but
Are they common? I mean, users that know they will need more resources, they can change the value. The users should know what they are doing and what is needed. Anyway, as I said, I don't like much the idea of having something that is not that maintained from our side (as we are not using it regularly) and not tested at all. |
I'm not sure VPA fixes any OOM, given Java's tendency to use any memory you throw at it. OOM is, in general, the result of some misconfiguration. |
|
I understand your concern# and I can definitely add it to my umbrella chart. I'll close this pull request then :) |
Problem
The cluster operator has fixed
resources.requests/limitsinvalues.yaml, but the optimal values vary significantly with the number of Kafka clusters, topics, and users being managed. Without VPA, operators must tune resources manually based on observation, and OOMKill events are common in larger deployments.Changes
Add an opt-in
VerticalPodAutoscalerresource targeting thestrimzi-cluster-operatorDeployment.verticalPodAutoscaler.enabled: false) — no VPA CRDs are required in environments that don't use VPAupdateMode(Auto|Recreate|Initial|Off)controlledResourcesBackwards compatibility
Disabled by default — no impact on existing installations.