You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove requirement that auto-populated tables have FK-only primary keys
(this constraint is handled elsewhere, not by the jobs system)
- Clarify that jobs table PK includes only FK-derived attributes from
the target table's primary key
- Add example showing how additional PK attributes are excluded
- Add comprehensive Hazard Analysis section covering:
- Race conditions (reservation, refresh, completion)
- State transitions (invalid, stuck, ignored)
- Data integrity (stale jobs, sync, transactions)
- Performance (table size, refresh speed)
- Operational (accidental deletion, priority)
- Migration (legacy table, version mixing)
Copy file name to clipboardExpand all lines: docs/src/design/autopopulate-2.0-spec.md
+88-98Lines changed: 88 additions & 98 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,54 +32,11 @@ The existing `~jobs` table has significant limitations:
32
32
33
33
### Core Design Principles
34
34
35
-
1.**Foreign-key-only primary keys**: Auto-populated tables cannot introduce new primary key attributes; their primary key must comprise only foreign key references
36
-
2.**Per-table jobs**: Each computed table gets its own hidden jobs table
37
-
3.**Native primary keys**: Jobs table uses the same primary key structure as its parent table (no hashes)
38
-
4.**No FK constraints on jobs**: Jobs tables omit foreign key constraints for performance; stale jobs are cleaned by `refresh()`
39
-
5.**Rich status tracking**: Extended status values for full lifecycle visibility
40
-
6.**Automatic refresh**: `populate()` automatically refreshes the jobs queue (adding new jobs, removing stale ones)
41
-
42
-
### Primary Key Constraint
43
-
44
-
**Auto-populated tables (`dj.Imported` and `dj.Computed`) must have primary keys composed entirely of foreign key references.**
45
-
46
-
This constraint ensures:
47
-
-**1:1 key_source mapping**: Each entry in `key_source` corresponds to exactly one potential job
48
-
-**Deterministic job identity**: A job's identity is fully determined by its parent records
49
-
-**Simplified jobs table**: The jobs table can directly reference the same parents as the computed table
50
-
51
-
```python
52
-
# VALID: Primary key is entirely foreign keys
53
-
@schema
54
-
classFilteredImage(dj.Computed):
55
-
definition ="""
56
-
-> Image
57
-
---
58
-
filtered_image : <djblob>
59
-
"""
60
-
61
-
# VALID: Multiple foreign keys in primary key
62
-
@schema
63
-
classComparison(dj.Computed):
64
-
definition ="""
65
-
-> Image.proj(image_a='image_id')
66
-
-> Image.proj(image_b='image_id')
67
-
---
68
-
similarity : float
69
-
"""
70
-
71
-
# INVALID: Additional primary key attribute not allowed
72
-
@schema
73
-
classAnalysis(dj.Computed):
74
-
definition ="""
75
-
-> Recording
76
-
analysis_method : varchar(32) # NOT ALLOWED - adds to primary key
77
-
---
78
-
result : float
79
-
"""
80
-
```
81
-
82
-
**Legacy table support**: Existing tables that introduce additional primary key attributes (beyond foreign keys) can still use the jobs system, but their jobs table will only include the foreign-key-derived primary key attributes. This means multiple target rows may map to a single job entry. A deprecation warning will be issued for such tables.
35
+
1.**Per-table jobs**: Each computed table gets its own hidden jobs table
36
+
2.**FK-derived primary keys**: Jobs table primary key includes only attributes derived from foreign keys in the target table's primary key (not additional primary key attributes)
37
+
3.**No FK constraints on jobs**: Jobs tables omit foreign key constraints for performance; stale jobs are cleaned by `refresh()`
38
+
4.**Rich status tracking**: Extended status values for full lifecycle visibility
39
+
5.**Automatic refresh**: `populate()` automatically refreshes the jobs queue (adding new jobs, removing stale ones)
83
40
84
41
## Architecture
85
42
@@ -91,7 +48,7 @@ Each `dj.Imported` or `dj.Computed` table `MyTable` will have an associated hidd
91
48
# Job queue for MyTable
92
49
subject_id : int
93
50
session_id : int
94
-
... # Same primary key attributes as MyTable (NO foreign key constraints)
51
+
... # Only FK-derived primary key attributes (NO foreign key constraints)
95
52
---
96
53
status : enum('pending', 'reserved', 'success', 'error', 'ignore')
97
54
priority : int # Lower = more urgent (0 = highest priority, default: 5)
@@ -109,10 +66,10 @@ connection_id : bigint unsigned # MySQL connection ID
109
66
version : varchar(255) # Code version (git hash, package version, etc.)
110
67
```
111
68
112
-
**Important**: The jobs table has the same primary key *structure* as the target table but **no foreign key constraints**. This is intentional for performance:
113
-
-Foreign key constraints add overhead on every insert/update/delete
114
-
-Jobs tables are high-traffic (frequent reservations and completions)
115
-
-Stale jobs (referencing deleted upstream records) are handled by `refresh()` instead
69
+
**Important**: The jobs table primary key includes only those attributes that come through foreign keys in the target table's primary key. Additional primary key attributes (if any) are excluded. This means:
70
+
-If a target table has primary key `(-> Subject, -> Session, method)`, the jobs table has primary key `(subject_id, session_id)` only
71
+
-Multiple target rows may map to a single job entry when additional PK attributes exist
72
+
-Jobs tables have **no foreign key constraints** for performance (stale jobs handled by `refresh()`)
116
73
117
74
### Access Pattern
118
75
@@ -378,25 +335,36 @@ Jobs tables follow the existing hidden table naming pattern:
378
335
- Table `FilteredImage` (stored as `__filtered_image`)
379
336
- Jobs table: `~filtered_image__jobs` (stored as `_filtered_image__jobs`)
380
337
381
-
### Primary Key Matching (No Foreign Keys)
338
+
### Primary Key Derivation
382
339
383
-
The jobs table has the same primary key *attributes* as the target table, but **without foreign key constraints**:
340
+
The jobs table primary key includes only those attributes derived from foreign keys in the target table's primary key:
384
341
385
342
```python
386
-
#If FilteredImage has definition:
343
+
#Example 1: FK-only primary key (simple case)
387
344
@schema
388
345
classFilteredImage(dj.Computed):
389
346
definition ="""
390
347
-> Image
391
348
---
392
349
filtered_image : <djblob>
393
350
"""
351
+
# Jobs table primary key: (image_id) — same as target
394
352
395
-
# The jobs table will have the same primary key (image_id),
396
-
# but NO foreign key constraint to Image.
397
-
# This is for performance - FK constraints add overhead.
# Jobs table primary key: (recording_id) — excludes 'analysis_method'
363
+
# One job entry covers all analysis_method values for a given recording
398
364
```
399
365
366
+
The jobs table has **no foreign key constraints** for performance reasons.
367
+
400
368
### Stale Job Handling
401
369
402
370
Stale jobs are pending jobs whose upstream records have been deleted. Since there are no FK constraints on jobs tables, these jobs remain until cleaned up by `refresh()`:
The jobs table is created with the appropriate primary key structure matching the target table's foreign-key-derived attributes.
422
+
The jobs table is created with a primary key derived from the target table's foreignkey attributes.
455
423
456
424
### Conflict Resolution
457
425
@@ -625,6 +593,61 @@ for jobs_table in schema.jobs:
625
593
626
594
This replaces the legacy single `~jobs` table with direct access to per-table jobs.
627
595
596
+
## Hazard Analysis
597
+
598
+
This section identifies potential hazards and their mitigations.
599
+
600
+
### Race Conditions
601
+
602
+
| Hazard | Description | Mitigation |
603
+
|--------|-------------|------------|
604
+
|**Simultaneous reservation**| Two workers reserve the same pending job at nearly the same time | Acceptable: duplicate `make()` calls are resolved by transaction—second worker gets duplicate key error |
605
+
|**Reserve during refresh**| Worker reserves a job while another process is running `refresh()`| No conflict: `refresh()` adds new jobs and removes stale ones; reservation updates existing rows |
606
+
|**Concurrent refresh calls**| Multiple processes call `refresh()` simultaneously | Acceptable: may result in duplicate insert attempts, but primary key constraint prevents duplicates |
607
+
|**Complete vs delete race**| One process completes a job while another deletes it | Acceptable: one operation succeeds, other becomes no-op (row not found) |
|**Stuck in reserved**| Worker crashes while job is reserved (orphaned job) | Manual intervention required: `jobs.reserved.delete()` (see Orphaned Job Handling) |
615
+
|**Success re-pended unexpectedly**|`refresh()` re-pends a success job when user expected it to stay | Only occurs if `keep_completed=True` AND key exists in `key_source` but not in target; document clearly |
616
+
|**Ignore not respected**| Ignored jobs get processed anyway | Implementation must skip `status='ignore'` in `populate()` job fetching |
617
+
618
+
### Data Integrity
619
+
620
+
| Hazard | Description | Mitigation |
621
+
|--------|-------------|------------|
622
+
|**Stale job processed**| Job references deleted upstream data |`make()` will fail or produce invalid results; `refresh()` cleans stale jobs before processing |
623
+
|**Jobs table out of sync**| Jobs table doesn't match `key_source`|`refresh()` synchronizes; call periodically or rely on `populate(refresh=True)`|
624
+
|**Partial make failure**|`make()` partially succeeds then fails | DataJoint transaction rollback ensures atomicity; job marked as error |
625
+
|**Error message truncation**| Error details exceed `varchar(2047)`| Full stack stored in `error_stack` (mediumblob); `error_message` is summary only |
626
+
627
+
### Performance
628
+
629
+
| Hazard | Description | Mitigation |
630
+
|--------|-------------|------------|
631
+
|**Large jobs table**| Jobs table grows very large with `keep_completed=True`| Default is `keep_completed=False`; provide guidance on periodic cleanup |
632
+
|**Slow refresh on large key_source**|`refresh()` queries entire `key_source`| Can restrict refresh to subsets: `jobs.refresh(Subject & 'lab="smith"')`|
633
+
|**Many jobs tables per schema**| Schema with many computed tables has many jobs tables | Jobs tables are lightweight; only created on first use |
634
+
635
+
### Operational
636
+
637
+
| Hazard | Description | Mitigation |
638
+
|--------|-------------|------------|
639
+
|**Accidental job deletion**| User runs `jobs.delete()` without restriction |`delete()` inherits from `delete_quick()` (no confirmation); users must apply restrictions carefully |
640
+
|**Clearing active jobs**| User clears reserved jobs while workers are running | Document warning in Orphaned Job Handling; recommend coordinating with orchestrator |
641
+
|**Priority confusion**| User expects higher number = higher priority | Document clearly: lower values are more urgent (0 = highest priority) |
642
+
643
+
### Migration
644
+
645
+
| Hazard | Description | Mitigation |
646
+
|--------|-------------|------------|
647
+
|**Legacy ~jobs table conflict**| Old `~jobs` table exists alongside new per-table jobs | Systems are independent; legacy table can be dropped manually |
648
+
|**Mixed version workers**| Some workers use old system, some use new | Major release; do not support mixed operation—require full migration |
649
+
|**Lost error history**| Migrating loses error records from legacy table | Document migration procedure; users can export legacy errors before migration |
650
+
628
651
## Future Extensions
629
652
630
653
-[ ] Web-based dashboard for job monitoring
@@ -667,43 +690,10 @@ The current system hashes primary keys to support arbitrary key types. The new s
667
690
3.**Foreign keys**: Hash-based keys cannot participate in foreign key relationships
668
691
4.**Simplicity**: No need for hash computation and comparison
669
692
670
-
### Why Require Foreign-Key-Only Primary Keys?
671
-
672
-
Restricting auto-populated tables to foreign-key-only primary keys provides:
673
-
674
-
1.**1:1 job correspondence**: Each `key_source` entry maps to exactly one job, eliminating ambiguity about what constitutes a "job"
675
-
2.**Matching key structure**: The jobs table primary key exactly matches the target table, enabling efficient stale detection via `key_source` comparison
676
-
3.**Eliminates key_source complexity**: No need for custom `key_source` definitions to enumerate non-foreign-key combinations
677
-
4.**Clearer data model**: The computation graph is fully determined by table dependencies
678
-
5.**Simpler populate logic**: No need to handle partial key matching or key enumeration
679
-
680
-
**What if I need multiple outputs per parent?**
681
-
682
-
Use a part table pattern instead:
683
-
684
-
```python
685
-
# Instead of adding analysis_method to primary key:
0 commit comments