Skip to content

Commit df94fcc

Browse files
committed
Add foreign-key-only primary key constraint to spec
Auto-populated tables must have primary keys composed entirely of foreign key references. This ensures 1:1 job correspondence and enables proper referential integrity for the jobs table.
1 parent 9ad4830 commit df94fcc

File tree

1 file changed

+89
-5
lines changed

1 file changed

+89
-5
lines changed

docs/src/design/autopopulate-2.0-spec.md

Lines changed: 89 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,54 @@ The existing `~jobs` table has significant limitations:
2727

2828
### Core Design Principles
2929

30-
1. **Per-table jobs**: Each computed table gets its own hidden jobs table
31-
2. **Native primary keys**: Jobs table uses the same primary key structure as its parent table (no hashes)
32-
3. **Referential integrity**: Jobs are foreign-key linked to parent tables with cascading deletes
33-
4. **Rich status tracking**: Extended status values for full lifecycle visibility
34-
5. **Automatic refresh**: `populate()` automatically refreshes the jobs queue
30+
1. **Foreign-key-only primary keys**: Auto-populated tables cannot introduce new primary key attributes; their primary key must comprise only foreign key references
31+
2. **Per-table jobs**: Each computed table gets its own hidden jobs table
32+
3. **Native primary keys**: Jobs table uses the same primary key structure as its parent table (no hashes)
33+
4. **Referential integrity**: Jobs are foreign-key linked to parent tables with cascading deletes
34+
5. **Rich status tracking**: Extended status values for full lifecycle visibility
35+
6. **Automatic refresh**: `populate()` automatically refreshes the jobs queue
36+
37+
### Primary Key Constraint
38+
39+
**Auto-populated tables (`dj.Imported` and `dj.Computed`) must have primary keys composed entirely of foreign key references.**
40+
41+
This constraint ensures:
42+
- **1:1 key_source mapping**: Each entry in `key_source` corresponds to exactly one potential job
43+
- **Deterministic job identity**: A job's identity is fully determined by its parent records
44+
- **Simplified jobs table**: The jobs table can directly reference the same parents as the computed table
45+
46+
```python
47+
# VALID: Primary key is entirely foreign keys
48+
@schema
49+
class FilteredImage(dj.Computed):
50+
definition = """
51+
-> Image
52+
---
53+
filtered_image : <djblob>
54+
"""
55+
56+
# VALID: Multiple foreign keys in primary key
57+
@schema
58+
class Comparison(dj.Computed):
59+
definition = """
60+
-> Image.proj(image_a='image_id')
61+
-> Image.proj(image_b='image_id')
62+
---
63+
similarity : float
64+
"""
65+
66+
# INVALID: Additional primary key attribute not allowed
67+
@schema
68+
class Analysis(dj.Computed):
69+
definition = """
70+
-> Recording
71+
analysis_method : varchar(32) # NOT ALLOWED - adds to primary key
72+
---
73+
result : float
74+
"""
75+
```
76+
77+
**Migration note**: Existing tables that violate this constraint will continue to work but cannot use the new jobs system. A deprecation warning will be issued.
3578

3679
## Architecture
3780

@@ -525,3 +568,44 @@ The current system hashes primary keys to support arbitrary key types. The new s
525568
2. **Query efficiency**: Native keys can use table indexes
526569
3. **Foreign keys**: Hash-based keys cannot participate in foreign key relationships
527570
4. **Simplicity**: No need for hash computation and comparison
571+
572+
### Why Require Foreign-Key-Only Primary Keys?
573+
574+
Restricting auto-populated tables to foreign-key-only primary keys provides:
575+
576+
1. **1:1 job correspondence**: Each `key_source` entry maps to exactly one job, eliminating ambiguity about what constitutes a "job"
577+
2. **Proper referential integrity**: The jobs table can reference the same parent tables, enabling cascading deletes
578+
3. **Eliminates key_source complexity**: No need for custom `key_source` definitions to enumerate non-foreign-key combinations
579+
4. **Clearer data model**: The computation graph is fully determined by table dependencies
580+
5. **Simpler populate logic**: No need to handle partial key matching or key enumeration
581+
582+
**What if I need multiple outputs per parent?**
583+
584+
Use a part table pattern instead:
585+
586+
```python
587+
# Instead of adding analysis_method to primary key:
588+
@schema
589+
class Analysis(dj.Computed):
590+
definition = """
591+
-> Recording
592+
---
593+
timestamp : datetime
594+
"""
595+
596+
class Method(dj.Part):
597+
definition = """
598+
-> master
599+
analysis_method : varchar(32)
600+
---
601+
result : float
602+
"""
603+
604+
def make(self, key):
605+
self.insert1(key)
606+
for method in ['pca', 'ica', 'nmf']:
607+
result = run_analysis(key, method)
608+
self.Method.insert1({**key, 'analysis_method': method, 'result': result})
609+
```
610+
611+
This pattern maintains the 1:1 job mapping while supporting multiple outputs per computation.

0 commit comments

Comments
 (0)