-
Notifications
You must be signed in to change notification settings - Fork 269
Description
Describe the bug
Found this issue with the ACK controllers used in sharding mode
- name: ACK_WATCH_SELECTORS
value: environment in (sbx)
If resource X
is moved from shard-1
toshard-2
(e.g, by changing the environment
label)
Both controllers will reconcile that resource X
(manual restart of controllers solve this problem)
Obs
Only if we do an edit on resource X
- Controller fromshard-1
will stop reconciling it (desired behaviour)
- And it will only be reconciled by the controller fromshard-2
Steps to reproduce
We have 2 shards
ack-system-shard-1
- name: ACK_WATCH_SELECTORS
value: environment in (sbx)
ack-system-shard-2
- name: ACK_WATCH_SELECTORS
value: environment in (ci)
- Create this s3 Bucket
apiVersion: s3.services.k8s.aws/v1alpha1
kind: Bucket
metadata:
annotations:
services.k8s.aws/region: us-east-1
labels:
location: us-east-1
environment: sbx # --watch-selectors for `shard-1`
name: itaiatu-sharding-bug3
namespace: ci-clusters
spec:
...
name: itaiatu-sharding-bug3
tagging:
tagSet:
- key: Name
value: itaiatu-sharding-bug3
Controller from shard-1
logs
{"level":"debug","ts":"2025-06-30T10:21:37.717Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}
Controller from shard-2
logs
<empty>
Since now, this is the desired behaviour.
- Edit the s3 Bucket and change its label from
environment: sbx
toenvironment: ci
.
Controller from shard-1
logs
{"level":"debug","ts":"2025-06-30T10:23:42.614Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}
Controller from shard-2
logs
{"level":"debug","ts":"2025-06-30T10:24:02.770Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}
If comparing the timestamps, we see relatively close timestamps between these 2 controllers.
The problem arises due to controller-runtime's mechanism which doesn't invalidate the cache if the labels are changed and are not matching anymore, for example.
controller-runtime's cache and Kubernetes watch mechanisms only keep resources that match the watch selectors. If label changes cause an object to stop matching, the cache does not auto-invalidate or remove it because the API server doesn't emit "no longer matches" events. This design optimizes scalability but requires controllers to verify label selectors on reconcile and handle potential stale state gracefully.
Expected outcome
When switching between one shard to another, the old controller shard to not reconcile that resource anymore and only the new shard should reconcile.
Environment
- Kubernetes version: 1.31
- Using EKS (yes/no), if so version? 1.31
- AWS service targeted (S3, RDS, etc.) S3 1.0.32