-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Long story short
When using kopf.index() with BatchingSettings(worker_limit=WORKER_LIMIT), operators can deadlock if WORKER_LIMIT is less than the total objects of the same indexed resource kind (e.g. index on Pod, cluster has 100 Pods, WORKER_LIMIT=50). Change-detection handlers never trigger, and the operator appears to hang after partial index population.
Root Cause
This is a deadlock caused by kopf's operator readiness mechanism:
- Each resource type has its own
Schedulerwith the configuredworker_limit - During startup, kopf spawns one async worker per object to perform initial indexing
- After indexing completes, each worker blocks waiting for global operator readiness (all resources and objects indexed)
- With
worker_limit=1, only 1 worker can run per resource kind - Deadlock: If we have 2 objects of the same kind to index,
Worker #1is blocked waiting forWorker #2to complete
indexing, butWorker #2can't start becauseWorker #1occupies the only available slot
Code Location (kopf internals)
The blocking occurs at kopf/_core/reactor/processing.py:106:
await operator_indexed.wait_for(True) # Blocks until ALL objects are indexed
But the Scheduler prevents spawning new workers at kopf/_cogs/aiokits/aiotasks.py:347-349:
def_can_spawn(self) -> bool:
return (not self._pending_coros.empty() and
(self._limit is None or len(self._running_tasks) < self._limit))
Kopf version
1.40.0
Kubernetes version
1.27.11
Python version
No response
Code
Logs
Additional information
This PR is trying to fix the issue #1218
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working