Skip to content

ACK Sharding: Resource reconciled by both shards #2545

@itaiatu

Description

@itaiatu

Describe the bug
Found this issue with the ACK controllers used in sharding mode

- name: ACK_WATCH_SELECTORS
  value: environment in (sbx)

If resource X is moved from shard-1 toshard-2 (e.g, by changing the environment label)
Both controllers will reconcile that resource X (manual restart of controllers solve this problem)

Obs

Only if we do an edit on resource X
- Controller from shard-1 will stop reconciling it (desired behaviour)
- And it will only be reconciled by the controller from shard-2

Steps to reproduce

We have 2 shards

  • ack-system-shard-1
- name: ACK_WATCH_SELECTORS
  value: environment in (sbx)
  • ack-system-shard-2
- name: ACK_WATCH_SELECTORS
  value: environment in (ci)
  1. Create this s3 Bucket
apiVersion: s3.services.k8s.aws/v1alpha1
kind: Bucket
metadata:
  annotations:
    services.k8s.aws/region: us-east-1
  labels:
    location: us-east-1
    environment: sbx # --watch-selectors for `shard-1`
  name: itaiatu-sharding-bug3
  namespace: ci-clusters
spec:
  ...
  name: itaiatu-sharding-bug3
  tagging:
    tagSet:
    - key: Name
      value: itaiatu-sharding-bug3

Controller from shard-1 logs

{"level":"debug","ts":"2025-06-30T10:21:37.717Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}

Controller from shard-2 logs

<empty>

Since now, this is the desired behaviour.

  1. Edit the s3 Bucket and change its label from environment: sbx to environment: ci.

Controller from shard-1 logs

{"level":"debug","ts":"2025-06-30T10:23:42.614Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}

Controller from shard-2 logs

{"level":"debug","ts":"2025-06-30T10:24:02.770Z","logger":"ackrt","msg":"> r.Sync","kind":"Bucket","namespace":"ci-clusters","name":"itaiatu-sharding-bug3","account":"381491899637","role":"arn:aws:iam::381491899637:role/ethos-core-ethos996-stage-or2-ack-s3-assumedrole","region":"us-east-1","generation":1}

If comparing the timestamps, we see relatively close timestamps between these 2 controllers.


The problem arises due to controller-runtime's mechanism which doesn't invalidate the cache if the labels are changed and are not matching anymore, for example.

controller-runtime's cache and Kubernetes watch mechanisms only keep resources that match the watch selectors. If label changes cause an object to stop matching, the cache does not auto-invalidate or remove it because the API server doesn't emit "no longer matches" events. This design optimizes scalability but requires controllers to verify label selectors on reconcile and handle potential stale state gracefully.


Expected outcome
When switching between one shard to another, the old controller shard to not reconcile that resource anymore and only the new shard should reconcile.

Environment

  • Kubernetes version: 1.31
  • Using EKS (yes/no), if so version? 1.31
  • AWS service targeted (S3, RDS, etc.) S3 1.0.32

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions