Skip to content

sharder: only move controlled objects after successfully draining main object #639

@timebertt

Description

@timebertt

What would you like to be added:

Currently, the sharder moves controlled objects in an uncoordinated manner with the drain operation on the main object. I.e., a controlled object might be moved before the shard acknowledged the drain operation by removing the shard label of the main object.

The sharder could wait for the main object to be drained successfully before moving controlled objects.

Why is this needed:

When a controlled object is moved before the main object has been drained successfully, the controller might observe the change of assignments of the controlled object before the main object (there is no guaranteed order of the watch events for different resources).
The old shard might perform another reconciliation (of the main object) without knowing about the controlled object (already disappeared from watch cache). This can lead to conflicts or uncoordinated actions.

I observed this situation in one e2e test run:

2025-08-17T00:32:30.745Z        DEBUG   Draining object from shard      {"controller": "sharder", "controllerGroup": "sharding.timebertt.dev", "controllerKind": "ControllerRing", "ControllerRing": {"name":"webhosting-operator"}, "namespace": "", "name": "webhosting-operator", "reconcileID": "a9c88b9d-425f-47ca-85bd-c79a814667cc", "resource": {"group":"webhosting.timebertt.dev","resource":"websites"}, "object": {"name":"foo-63","namespace":"e2e-webhosting-operator-9242be26-fmwrs"}, "currentShard": "webhosting-operator-9465f75c8-dbc7f"}
2025-08-17T00:32:30.755Z        DEBUG   Moving object   {"controller": "sharder", "controllerGroup": "sharding.timebertt.dev", "controllerKind": "ControllerRing", "ControllerRing": {"name":"webhosting-operator"}, "namespace": "", "name": "webhosting-operator", "reconcileID": "a9c88b9d-425f-47ca-85bd-c79a814667cc", "resource": {"group":"apps","resource":"deployments"}, "object": {"name":"foo-63-a7dc52","namespace":"e2e-webhosting-operator-9242be26-fmwrs"}}
2025-08-17T00:32:30.867Z        DEBUG   admission       Assigning object for ControllerRing     {"object": {"name":"foo-63-a7dc52","namespace":"e2e-webhosting-operator-9242be26-fmwrs"}, "namespace": "e2e-webhosting-operator-9242be26-fmwrs", "name": "foo-63-a7dc52", "resource": {"group":"apps","version":"v1","resource":"deployments"}, "user": "system:serviceaccount:sharding-system:sharder", "requestID": "edd99a4c-37a7-4904-961f-53d28efdda42", "controllerRing": {"name":"webhosting-operator"}, "shard": "webhosting-operator-9465f75c8-bknn4"}
2025-08-17T00:32:30.872Z        DEBUG   Reconciling website     {"controller": "website", "controllerGroup": "webhosting.timebertt.dev", "controllerKind": "Website", "Website": {"name":"foo-63","namespace":"e2e-webhosting-operator-9242be26-fmwrs"}, "namespace": "e2e-webhosting-operator-9242be26-fmwrs", "name": "foo-63", "reconcileID": "e367c39e-795f-4ff1-b480-9145d219980c"}
2025-08-17T00:32:30.966Z        DEBUG   events  Error reconciling Deployment: deployments.apps "foo-63-a7dc52" already exists   {"type": "Warning", "object": {"kind":"Website","namespace":"e2e-webhosting-operator-9242be26-fmwrs","name":"foo-63","uid":"8ee329e0-45e0-4e1e-93a7-c741764aae8d","apiVersion":"webhosting.timebertt.dev/v1alpha1","resourceVersion":"3471"}, "reason": "ReconcilerError"}
2025-08-17T00:32:30.981Z        DEBUG   Draining object {"controller": "website", "controllerGroup": "webhosting.timebertt.dev", "controllerKind": "Website", "Website": {"name":"foo-63","namespace":"e2e-webhosting-operator-9242be26-fmwrs"}, "namespace": "e2e-webhosting-operator-9242be26-fmwrs", "name": "foo-63", "reconcileID": "2fe864fd-66a5-4800-bf4a-4470cbacd474"}

Here, we observe the following order of events:

  1. The sharder starts draining the main object (Website)
  2. The sharder moves controlled objects (including the Deployment)
  3. The Deployment is reassigned by the sharder webhook
  4. The old shard observes the Deployment change and reconciles the Website again
  5. The reconciliation errors because the controller tries to create the Deployment that already exists
  6. The old shard acknowledges the drain operation and removes the shard label from the Website

If the controller uses deterministic names for controlled objects, this is not a problem. E.g., the webhosting-operator (as observed above) causes a conflict/already exists error until all involved objects (main and controlled objects) have been fully reassigned.

The current sharder design assumes that sharded controllers can handle this form of intermediate inconsistencies with the usual measures for handling eventual consistency (e.g., deterministic names).
It should be investigated whether the sharder can and should only move controlled objects after draining the main object. Or whether the controller must take care of handling these situations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions