helm: add optional VerticalPodAutoscaler for the operator Deployment by QuentinBisson · Pull Request #12565 · strimzi/strimzi-kafka-operator

QuentinBisson · 2026-03-25T00:09:16Z

Problem

The cluster operator has fixed resources.requests/limits in values.yaml, but the optimal values vary significantly with the number of Kafka clusters, topics, and users being managed. Without VPA, operators must tune resources manually based on observation, and OOMKill events are common in larger deployments.

Changes

Add an opt-in VerticalPodAutoscaler resource targeting the strimzi-cluster-operator Deployment.

Disabled by default (verticalPodAutoscaler.enabled: false) — no VPA CRDs are required in environments that don't use VPA
Configurable updateMode (Auto | Recreate | Initial | Off)
Controls both CPU and memory via controlledResources

verticalPodAutoscaler:
  enabled: true
  updateMode: "Auto"

Backwards compatibility

Disabled by default — no impact on existing installations.

The cluster operator has fixed resource requests and limits, but the optimal values vary significantly depending on the number of Kafka clusters and topics being managed. Without VPA, operators must tune resources manually and reactively. Add an opt-in VPA resource (disabled by default) that targets the strimzi-cluster-operator Deployment. When enabled, VPA recommends and optionally applies CPU/memory adjustments automatically. Configuration: verticalPodAutoscaler: enabled: true # requires VPA CRDs on the cluster updateMode: "Auto" # Auto | Recreate | Initial | Off Disabled by default to avoid requiring VPA CRDs in all environments. Signed-off-by: QuentinBisson <quentin@giantswarm.io>

scholzj

Thanks for the PR. However - as mentioned in the previous PR(s) - before spamming us with 7 PRs for the same thing, you should ideally first talk with us about whether the things are actually considered useful and if we are interested in them. You should also check whether they should be opened as separate PRs or not.

Also, please make sure to use the PR template we use! And make sure to update the Helm Chart's README.md file with the new options!

I'm not sure this should be included. Vertical Pod Autoscaling of the Strimzi operator requires proper knowledge and experience of the operator. If you need it, I think you should add it yourself.

QuentinBisson · 2026-03-25T09:57:21Z

The concern about operator-specific VPA knowledge is valid, which is exactly why I propose shipping this disabled by default (verticalPodAutoscaler.enabled: false). Users who enable it are opting in knowingly. This is the same pattern used for PodDisruptionBudget and NetworkPolicy in this very chart — both are off by default and both require platform knowledge to use correctly.

Happy to update the PR with clearer documentation around the risks if that helps.

im-konge · 2026-04-02T14:34:03Z

Few things to this, I didn't use Vertical Pod Autoscaler that much, however I'm not sure we want to have something like this in the Helm charts. I think it also creates another path which is "supported" by us but not maintained and tested. If someone has such desire and need, they can create it without need of having it in the Helm chart. We are not using this in the regular YAML manifests and I think that our Helm charts just follow what we have there. So that's two things actually that would be - from my side - against adding it into the Helm charts.

Kafka clusters, topics, and users being managed.

Okay I take Kafka clusters, but KafkaTopics and KafkaUsers are managed by two different operators - and this autoscaling would not help in that areas.

OOMKill events are common in larger deployments

Are they common? I mean, users that know they will need more resources, they can change the value. The users should know what they are doing and what is needed.

Anyway, as I said, I don't like much the idea of having something that is not that maintained from our side (as we are not using it regularly) and not tested at all.

scholzj · 2026-04-02T16:10:44Z

OOMKill events are common in larger deployments

Are they common? I mean, users that know they will need more resources, they can change the value. The users should know what they are doing and what is needed.

Anyway, as I said, I don't like much the idea of having something that is not that maintained from our side (as we are not using it regularly) and not tested at all.

I'm not sure VPA fixes any OOM, given Java's tendency to use any memory you throw at it. OOM is, in general, the result of some misconfiguration.

QuentinBisson · 2026-04-03T16:46:37Z

I understand your concern# and I can definitely add it to my umbrella chart. I'll close this pull request then :)

scholzj requested changes Mar 25, 2026

View reviewed changes

QuentinBisson closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: add optional VerticalPodAutoscaler for the operator Deployment#12565

helm: add optional VerticalPodAutoscaler for the operator Deployment#12565
QuentinBisson wants to merge 1 commit intostrimzi:mainfrom
QuentinBisson:feat/vpa

QuentinBisson commented Mar 25, 2026

Uh oh!

scholzj left a comment

Uh oh!

QuentinBisson commented Mar 25, 2026

Uh oh!

im-konge commented Apr 2, 2026

Uh oh!

scholzj commented Apr 2, 2026

Uh oh!

QuentinBisson commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

QuentinBisson commented Mar 25, 2026

Problem

Changes

Backwards compatibility

Uh oh!

scholzj left a comment

Choose a reason for hiding this comment

Uh oh!

QuentinBisson commented Mar 25, 2026

Uh oh!

im-konge commented Apr 2, 2026

Uh oh!

scholzj commented Apr 2, 2026

Uh oh!

QuentinBisson commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants