Skip to content

Broker decommissioner permission error #1217

@shastaxc

Description

@shastaxc

I deployed using the Helm chart to OpenShift Kubernetes Distribution. Running a 3 broker Redpanda cluster. I initially deployed with the broker decommissioner disabled and everything looks good. Then I updated it with the broker decommissioner enabled and now the sidecar container (operator) is spamming the following errors:

{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.PersistentVolumeClaim"}
{"level":"error","ts":"2025-12-26T18:24:04.075Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.Pod","error":"failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"pods\" in API group \"\" at the cluster scope"}
{"level":"error","ts":"2025-12-26T18:24:04.076Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.PersistentVolumeClaim","error":"failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"persistentvolumeclaims\" in API group \"\" at the cluster scope"}
{"level":"error","ts":"2025-12-26T18:24:04.076Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.StatefulSet","error":"failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope"}

This is deployed to namespace "birch-ns".

Here is the override values file I used in my install:

fullnameOverride: "redpanda-birch"

image:
  repository: harbor.dice.ste-is.com/docker.redpanda.com/redpandadata/redpanda
  tag: "v25.1.12"

podTemplate:
  spec:
    securityContext:
      fsGroup: null
      runAsUser: null

console:
  enabled: true
  image:
    registry: harbor.dice.ste-is.com/docker.redpanda.com
    repository: redpandadata/console
    tag: "v3.1.0"
  fullnameOverride: "redpanda-console-birch"
  podSecurityContext:
    runAsUser: null
    fsGroup: null
    fsGroupChangePolicy: null
  ingress:
    enabled: true
    className: nginx
    hosts:
      - host: redpanda-console-birch.hq.ste-is.com
        paths:
          - path: /
            pathType: Prefix
    tls:
      - hosts:
          - redpanda-console-birch.hq.ste-is.com

external:
  enabled: true
  service:
    enabled: false

logging:
  usageStats:
    enabled: false

monitoring:
  enabled: true
  enableHttp2: false

resources:
  cpu:
    cores: 1
    overprovisioned: true
  memory:
    enable_memory_locking: true
    container:
      min: 10Gi
      max: 10Gi

storage:
  persistentVolume:
    size: 150Gi
    storageClass: ""

statefulset:
  replicas: 3
  sideCars:
    image:
      repository: harbor.dice.ste-is.com/docker.redpanda.com/redpandadata/redpanda-operator
      tag: v25.1.3
    pvcUnbinder:
      enabled: false
    brokerDecommissioner:
      enabled: true
      decommissionAfter: 120s
      decommissionRequeueTimeout: 30s
    configWatcher:
      enabled: true
    controllers:
      enabled: true

  initContainers:
    fsValidator:
      enabled: false
    setDataDirOwnership:
      enabled: false
  initContainerImage:
    repository: harbor.dice.ste-is.com/docker.io/library/busybox
    tag: 1.36.1

# Disable tuning because it requires a privileged container
tuning:
  tune_aio_events: false

I'm not sure why it thinks it needs cluster-scoped permissions to view sts or pods, but I confirmed that the ServiceAccount is linked to the decommissioner Role that gives it those permissions:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: redpanda-birch-decommission
  namespace: birch-ns
rules:
- verbs:
  - create
  - patch
  apiGroups:
  - ""
  resources:
  - events
- verbs:
  - delete
  - get
  - list
  - watch
  apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
- verbs:
  - get
  - list
  - watch
  apiGroups:
  - ""
  resources:
  - pods
  - secrets
- verbs:
  - get
  - list
  - watch
  apiGroups:
  - apps
  resources:
  - statefulsets

And there is a ClusterRole bound to the SA that gives it a PVC patch permission:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: redpanda-birch-birch-ns-decommission
rules:
- verbs:
  - patch
  apiGroups:
  - ""
  resources:
  - persistentvolumes

So as far as I can tell, it should have all the permissions it needs, but the logs are spamming an error saying it is missing permissions. As far as actual functionality, I tested by deleting the PV, PVC, and Pod for one of the brokers, then letting the STS spin up new ones. Checking the cluster health with rpk cluster health would show the new broker running (bringing it back to 3 nodes online) and the old broker listed in the "Nodes Down" list. This health state persisted for a few hours and the old broker was never decommissioned as I'd expect the broker decommissioner to do. I was only able to get the cluster back to a healthy state by manually decommissioning the old broker with rpk redpanda admin brokers decommission <old_id> --force, which is what I expected the broker decommissioner to do automatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions