Skip to content

Commit 86e21f4

Browse files
committed
Clarify conflict resolution and add pre-partitioning pattern
- Clarify that transaction-based conflict resolution applies regardless of reserve_jobs setting (True or False) - Add new section "Job Reservation vs Pre-Partitioning" documenting the alternative workflow where orchestrators explicitly divide jobs before distributing to workers - Include comparison table for when to use each approach
1 parent 77c7cf5 commit 86e21f4

File tree

1 file changed

+39
-10
lines changed

1 file changed

+39
-10
lines changed

docs/src/design/autopopulate-2.0-spec.md

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -423,22 +423,51 @@ The jobs table is created with a primary key derived from the target table's for
423423

424424
### Conflict Resolution
425425

426-
Job reservation is performed via `update1()` for each key individually before calling `make()`. The client provides its own `pid`, `host`, and `connection_id` information. No transaction-level locking is used.
426+
Conflict resolution relies on the transaction surrounding each `make()` call. This applies regardless of whether `reserve_jobs=True` or `reserve_jobs=False`:
427427

428-
**Conflict scenario** (rare):
429-
1. Two workers reserve the same job nearly simultaneously
430-
2. Both run `make()` for the same key
431-
3. First worker's `make()` transaction commits, inserting the result
432-
4. Second worker's `make()` transaction fails with duplicate key error
433-
5. Second worker catches the error and moves to the next job
428+
- With `reserve_jobs=False`: Workers query `key_source` directly and may attempt the same key
429+
- With `reserve_jobs=True`: Job reservation reduces conflicts but doesn't eliminate them entirely
430+
431+
When two workers attempt to populate the same key:
432+
1. Both call `make()` for the same key
433+
2. First worker's `make()` transaction commits, inserting the result
434+
3. Second worker's `make()` transaction fails with duplicate key error
435+
4. Second worker catches the error and moves to the next job
434436

435437
**Why this is acceptable**:
436-
- Conflicts are rare in practice (requires near-simultaneous reservation)
437-
- The `make()` transaction already guarantees data integrity
438+
- The `make()` transaction guarantees data integrity
438439
- Duplicate key error is a clean, expected signal
439-
- Avoids locking overhead on the high-traffic jobs table
440+
- With `reserve_jobs=True`, conflicts are rare (requires near-simultaneous reservation)
440441
- Wasted computation is minimal compared to locking complexity
441442

443+
### Job Reservation vs Pre-Partitioning
444+
445+
The job reservation mechanism (`reserve_jobs=True`) allows workers to dynamically claim jobs from a shared queue. However, some orchestration systems may prefer to **pre-partition** jobs before distributing them to workers:
446+
447+
```python
448+
# Pre-partitioning example: orchestrator divides work explicitly
449+
all_pending = FilteredImage.jobs.pending.fetch("KEY")
450+
451+
# Split jobs among workers (e.g., by worker index)
452+
n_workers = 4
453+
for worker_id in range(n_workers):
454+
worker_jobs = all_pending[worker_id::n_workers] # Round-robin assignment
455+
# Send worker_jobs to worker via orchestration system (Slurm, K8s, etc.)
456+
457+
# Worker receives its assigned keys and processes them directly
458+
for key in assigned_keys:
459+
FilteredImage.populate(key, reserve_jobs=False)
460+
```
461+
462+
**When to use each approach**:
463+
464+
| Approach | Use Case |
465+
|----------|----------|
466+
| **Dynamic reservation** (`reserve_jobs=True`) | Simple setups, variable job durations, workers that start/stop dynamically |
467+
| **Pre-partitioning** | Batch schedulers (Slurm, PBS), predictable job counts, avoiding reservation overhead |
468+
469+
Both approaches benefit from the same transaction-based conflict resolution as a safety net.
470+
442471
### Orphaned Job Handling
443472

444473
Orphaned jobs are reserved jobs from crashed or terminated processes. The API does not provide an algorithmic method for detecting or clearing orphaned jobs because this is dependent on the orchestration system (e.g., Slurm job IDs, Kubernetes pod status, process heartbeats).

0 commit comments

Comments
 (0)