-
Notifications
You must be signed in to change notification settings - Fork 14
Description
I deployed using the Helm chart to OpenShift Kubernetes Distribution. Running a 3 broker Redpanda cluster. I initially deployed with the broker decommissioner disabled and everything looks good. Then I updated it with the broker decommissioner enabled and now the sidecar container (operator) is spamming the following errors:
{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2025-12-26T18:24:04.065Z","msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.PersistentVolumeClaim"}
{"level":"error","ts":"2025-12-26T18:24:04.075Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.Pod","error":"failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"pods\" in API group \"\" at the cluster scope"}
{"level":"error","ts":"2025-12-26T18:24:04.076Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.PersistentVolumeClaim","error":"failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"persistentvolumeclaims\" in API group \"\" at the cluster scope"}
{"level":"error","ts":"2025-12-26T18:24:04.076Z","msg":"Failed to watch","logger":"UnhandledError","reflector":"pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285","type":"*v1.StatefulSet","error":"failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User \"system:serviceaccount:birch-ns:redpanda-birch\" cannot list resource \"statefulsets\" in API group \"apps\" at the cluster scope"}
This is deployed to namespace "birch-ns".
Here is the override values file I used in my install:
fullnameOverride: "redpanda-birch"
image:
repository: harbor.dice.ste-is.com/docker.redpanda.com/redpandadata/redpanda
tag: "v25.1.12"
podTemplate:
spec:
securityContext:
fsGroup: null
runAsUser: null
console:
enabled: true
image:
registry: harbor.dice.ste-is.com/docker.redpanda.com
repository: redpandadata/console
tag: "v3.1.0"
fullnameOverride: "redpanda-console-birch"
podSecurityContext:
runAsUser: null
fsGroup: null
fsGroupChangePolicy: null
ingress:
enabled: true
className: nginx
hosts:
- host: redpanda-console-birch.hq.ste-is.com
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- redpanda-console-birch.hq.ste-is.com
external:
enabled: true
service:
enabled: false
logging:
usageStats:
enabled: false
monitoring:
enabled: true
enableHttp2: false
resources:
cpu:
cores: 1
overprovisioned: true
memory:
enable_memory_locking: true
container:
min: 10Gi
max: 10Gi
storage:
persistentVolume:
size: 150Gi
storageClass: ""
statefulset:
replicas: 3
sideCars:
image:
repository: harbor.dice.ste-is.com/docker.redpanda.com/redpandadata/redpanda-operator
tag: v25.1.3
pvcUnbinder:
enabled: false
brokerDecommissioner:
enabled: true
decommissionAfter: 120s
decommissionRequeueTimeout: 30s
configWatcher:
enabled: true
controllers:
enabled: true
initContainers:
fsValidator:
enabled: false
setDataDirOwnership:
enabled: false
initContainerImage:
repository: harbor.dice.ste-is.com/docker.io/library/busybox
tag: 1.36.1
# Disable tuning because it requires a privileged container
tuning:
tune_aio_events: false
I'm not sure why it thinks it needs cluster-scoped permissions to view sts or pods, but I confirmed that the ServiceAccount is linked to the decommissioner Role that gives it those permissions:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: redpanda-birch-decommission
namespace: birch-ns
rules:
- verbs:
- create
- patch
apiGroups:
- ""
resources:
- events
- verbs:
- delete
- get
- list
- watch
apiGroups:
- ""
resources:
- persistentvolumeclaims
- verbs:
- get
- list
- watch
apiGroups:
- ""
resources:
- pods
- secrets
- verbs:
- get
- list
- watch
apiGroups:
- apps
resources:
- statefulsets
And there is a ClusterRole bound to the SA that gives it a PVC patch permission:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: redpanda-birch-birch-ns-decommission
rules:
- verbs:
- patch
apiGroups:
- ""
resources:
- persistentvolumes
So as far as I can tell, it should have all the permissions it needs, but the logs are spamming an error saying it is missing permissions. As far as actual functionality, I tested by deleting the PV, PVC, and Pod for one of the brokers, then letting the STS spin up new ones. Checking the cluster health with rpk cluster health would show the new broker running (bringing it back to 3 nodes online) and the old broker listed in the "Nodes Down" list. This health state persisted for a few hours and the old broker was never decommissioned as I'd expect the broker decommissioner to do. I was only able to get the cluster back to a healthy state by manually decommissioning the old broker with rpk redpanda admin brokers decommission <old_id> --force, which is what I expected the broker decommissioner to do automatically.