You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove FK constraints from jobs tables for performance
- Jobs tables have matching primary key structure but no FK constraints
- Stale jobs (from deleted upstream records) handled by refresh()
- Added created_time field for stale detection
- refresh() now returns {added, removed} counts
- Updated rationale sections to reflect performance-focused design
Copy file name to clipboardExpand all lines: docs/src/design/autopopulate-2.0-spec.md
+49-22Lines changed: 49 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,9 +30,9 @@ The existing `~jobs` table has significant limitations:
30
30
1.**Foreign-key-only primary keys**: Auto-populated tables cannot introduce new primary key attributes; their primary key must comprise only foreign key references
31
31
2.**Per-table jobs**: Each computed table gets its own hidden jobs table
32
32
3.**Native primary keys**: Jobs table uses the same primary key structure as its parent table (no hashes)
33
-
4.**Referential integrity**: Jobs are foreign-key linked to parent tables with cascading deletes
33
+
4.**No FK constraints on jobs**: Jobs tables omit foreignkey constraints for performance; stale jobs are cleaned by `refresh()`
34
34
5.**Rich status tracking**: Extended status values for full lifecycle visibility
35
-
6.**Automatic refresh**: `populate()` automatically refreshes the jobs queue
35
+
6.**Automatic refresh**: `populate()` automatically refreshes the jobs queue (adding new jobs, removing stale ones)
36
36
37
37
### Primary Key Constraint
38
38
@@ -84,12 +84,13 @@ Each `dj.Imported` or `dj.Computed` table `MyTable` will have an associated hidd
84
84
85
85
```
86
86
# Job queue for MyTable
87
-
-> ParentTable1
88
-
-> ParentTable2
89
-
... # Same primary key structure as MyTable
87
+
subject_id : int
88
+
session_id : int
89
+
... # Same primary key attributes as MyTable (NO foreign key constraints)
90
90
---
91
91
status : enum('pending', 'reserved', 'success', 'error', 'ignore')
92
92
priority : int # Higher priority = processed first (default: 0)
93
+
created_time : datetime # When job was added to queue
93
94
scheduled_time : datetime # Process on or after this time (default: now)
94
95
reserved_time : datetime # When job was reserved (null if not reserved)
95
96
completed_time : datetime # When job completed (null if not completed)
@@ -103,6 +104,11 @@ connection_id : bigint unsigned # MySQL connection ID
103
104
version : varchar(255) # Code version (git hash, package version, etc.)
104
105
```
105
106
107
+
**Important**: The jobs table has the same primary key *structure* as the target table but **no foreign key constraints**. This is intentional for performance:
108
+
- Foreign key constraints add overhead on every insert/update/delete
109
+
- Jobs tables are high-traffic (frequent reservations and completions)
110
+
- Stale jobs (referencing deleted upstream records) are handled by `refresh()` instead
111
+
106
112
### Access Pattern
107
113
108
114
Jobs are accessed as a property of the computed table:
@@ -166,15 +172,23 @@ class JobsTable(Table):
166
172
"""Dynamically generated based on parent table's primary key."""
Refresh the jobs queue by scanning for missing entries.
177
+
Refresh the jobs queue: add new jobs and remove stale ones.
178
+
179
+
Operations performed:
180
+
1. Add new jobs: (key_source & restrictions) - target - jobs → insert as 'pending'
181
+
2. Remove stale jobs: pending jobs older than stale_timeout whose keys
182
+
are no longer in key_source (upstream records were deleted)
172
183
173
-
Computes: (key_source & restrictions) - target - jobs
174
-
Inserts new entries with status='pending'.
184
+
Args:
185
+
restrictions: Conditions to filter key_source
186
+
stale_timeout: Seconds after which pending jobs are checked for staleness.
187
+
Jobs older than this are removed if their key is no longer
188
+
in key_source. Default from config: jobs.stale_timeout (3600s)
175
189
176
190
Returns:
177
-
Number of new jobs added to queue.
191
+
{'added': int, 'removed': int} - counts of jobs added and stale jobs removed
178
192
"""
179
193
...
180
194
@@ -335,9 +349,9 @@ Jobs tables follow the existing hidden table naming pattern:
335
349
- Table `FilteredImage` (stored as `__filtered_image`)
336
350
- Jobs table: `~filtered_image__jobs` (stored as `_filtered_image__jobs`)
337
351
338
-
### Referential Integrity
352
+
### Primary Key Matching (No Foreign Keys)
339
353
340
-
The jobs table references the same parent tables as the computed table:
354
+
The jobs table has the same primary key *attributes*as the target table, but **without foreign key constraints**:
341
355
342
356
```python
343
357
# If FilteredImage has definition:
@@ -349,18 +363,31 @@ class FilteredImage(dj.Computed):
349
363
filtered_image : <djblob>
350
364
"""
351
365
352
-
# The jobs table will have:
353
-
#-> Image (same foreign key reference)
354
-
# This ensures cascading deletes work correctly
366
+
# The jobs table will have the same primary key (image_id),
367
+
#but NO foreign key constraint to Image.
368
+
# This is for performance - FK constraints add overhead.
355
369
```
356
370
357
-
### Cascading Behavior
371
+
### Stale Job Handling
358
372
359
-
When a parent record is deleted:
360
-
1. The corresponding computed table record is deleted (existing behavior)
361
-
2. The corresponding jobs table record is also deleted (new behavior)
373
+
When upstream records are deleted, their corresponding jobs become "stale" (orphaned). Since there are no FK constraints, these jobs remain in the table until cleaned up:
374
+
375
+
```python
376
+
# refresh() handles stale jobs automatically
377
+
result = FilteredImage.jobs.refresh()
378
+
# Returns: {'added': 10, 'removed': 3} # 3 stale jobs cleaned up
379
+
380
+
# Stale detection logic:
381
+
# 1. Find pending jobs where created_time < (now - stale_timeout)
382
+
# 2. Check if their keys still exist in key_source
383
+
# 3. Remove jobs whose keys no longer exist
384
+
```
362
385
363
-
This prevents orphaned job records.
386
+
**Why not use foreign key cascading deletes?**
387
+
- FK constraints add overhead on every insert/update/delete operation
388
+
- Jobs tables are high-traffic (frequent reservations and status updates)
389
+
- Stale jobs are harmless until refresh—they simply won't match key_source
390
+
- The `refresh()` approach is more efficient for batch cleanup
364
391
365
392
### Migration from Current System
366
393
@@ -557,7 +584,7 @@ Per-table jobs tables provide:
557
584
1.**Better isolation**: Jobs for one table don't affect others
558
585
2.**Simpler queries**: No need to filter by table_name
559
586
3.**Native keys**: Primary keys are readable, not hashed
560
-
4.**Referential integrity**: Automatic cleanup via foreign keys
587
+
4.**High performance**: No FK constraints means minimal overhead on job operations
561
588
5.**Scalability**: Each table's jobs can be indexed independently
562
589
563
590
### Why Remove Key Hashing?
@@ -574,7 +601,7 @@ The current system hashes primary keys to support arbitrary key types. The new s
574
601
Restricting auto-populated tables to foreign-key-only primary keys provides:
575
602
576
603
1.**1:1 job correspondence**: Each `key_source` entry maps to exactly one job, eliminating ambiguity about what constitutes a "job"
577
-
2.**Proper referential integrity**: The jobs table can reference the same parent tables, enabling cascading deletes
604
+
2.**Matching key structure**: The jobs table primary key exactly matches the target table, enabling efficient stale detection via `key_source` comparison
578
605
3.**Eliminates key_source complexity**: No need for custom `key_source` definitions to enumerate non-foreign-key combinations
579
606
4.**Clearer data model**: The computation graph is fully determined by table dependencies
580
607
5.**Simpler populate logic**: No need to handle partial key matching or key enumeration
0 commit comments