You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify conflict resolution and add pre-partitioning pattern
- Clarify that transaction-based conflict resolution applies regardless
of reserve_jobs setting (True or False)
- Add new section "Job Reservation vs Pre-Partitioning" documenting
the alternative workflow where orchestrators explicitly divide jobs
before distributing to workers
- Include comparison table for when to use each approach
Copy file name to clipboardExpand all lines: docs/src/design/autopopulate-2.0-spec.md
+39-10Lines changed: 39 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -423,22 +423,51 @@ The jobs table is created with a primary key derived from the target table's for
423
423
424
424
### Conflict Resolution
425
425
426
-
Job reservation is performed via `update1()` for each key individually before calling `make()`. The client provides its own `pid`, `host`, and `connection_id` information. No transaction-level locking is used.
426
+
Conflict resolution relies on the transaction surrounding each `make()` call. This applies regardless of whether `reserve_jobs=True` or `reserve_jobs=False`:
427
427
428
-
**Conflict scenario** (rare):
429
-
1. Two workers reserve the same job nearly simultaneously
430
-
2. Both run `make()` for the same key
431
-
3. First worker's `make()` transaction commits, inserting the result
432
-
4. Second worker's `make()` transaction fails with duplicate key error
433
-
5. Second worker catches the error and moves to the next job
428
+
- With `reserve_jobs=False`: Workers query `key_source` directly and may attempt the same key
429
+
- With `reserve_jobs=True`: Job reservation reduces conflicts but doesn't eliminate them entirely
430
+
431
+
When two workers attempt to populate the same key:
432
+
1. Both call `make()` for the same key
433
+
2. First worker's `make()` transaction commits, inserting the result
434
+
3. Second worker's `make()` transaction fails with duplicate key error
435
+
4. Second worker catches the error and moves to the next job
434
436
435
437
**Why this is acceptable**:
436
-
- Conflicts are rare in practice (requires near-simultaneous reservation)
437
-
- The `make()` transaction already guarantees data integrity
438
+
- The `make()` transaction guarantees data integrity
438
439
- Duplicate key error is a clean, expected signal
439
-
-Avoids locking overhead on the high-traffic jobs table
440
+
-With `reserve_jobs=True`, conflicts are rare (requires near-simultaneous reservation)
440
441
- Wasted computation is minimal compared to locking complexity
441
442
443
+
### Job Reservation vs Pre-Partitioning
444
+
445
+
The job reservation mechanism (`reserve_jobs=True`) allows workers to dynamically claim jobs from a shared queue. However, some orchestration systems may prefer to **pre-partition** jobs before distributing them to workers:
446
+
447
+
```python
448
+
# Pre-partitioning example: orchestrator divides work explicitly
Both approaches benefit from the same transaction-based conflict resolution as a safety net.
470
+
442
471
### Orphaned Job Handling
443
472
444
473
Orphaned jobs are reserved jobs from crashed or terminated processes. The API does not provide an algorithmic method for detecting or clearing orphaned jobs because this is dependent on the orchestration system (e.g., Slurm job IDs, Kubernetes pod status, process heartbeats).
0 commit comments