@@ -72,21 +72,22 @@ The `populate()` method orchestrates the job execution process:
7272 - For reserved jobs:
7373 - Updates job status to ` reserved ` during processing
7474 - Records execution metrics (duration, version)
75- - Updates status to ` success ` or ` error ` on completion
75+ - On successful completion: remove job from the jobs table
76+ - On error: update job status to ` error `
7677 - Records errors and execution metrics
7778
78794 . ** Cleanup** :
79- - Optionally purges invalid jobs
80+ - Optionally purges orphaned/outdated jobs
8081
8182## Job Cleanup Process
8283
83- The ` purge_invalid_jobs ` method maintains database consistency by removing invalid jobs:
84+ The ` purge_jobs ` method maintains database consistency by removing orphaned jobs:
8485
85- 1 . ** Invalid Success Jobs** :
86+ 1 . ** Orphaned Success Jobs** :
8687 - Identifies jobs marked as ` success ` but not present in the target table
8788 - These typically occur when target table entries are deleted
8889
89- 2 . ** Invalid Incomplete Jobs** :
90+ 2 . ** Orphaned Incomplete Jobs** :
9091 - Identifies jobs in ` scheduled ` /` error ` /` ignore ` state that are no longer in the ` key_source `
9192 - These typically occur when upstream table entries are deleted
9293
@@ -106,16 +107,16 @@ The "freshness" and consistency of the jobs table depends on regular maintenance
106107 - Example: Run every few minutes in a cron job for active pipelines
107108 - Event-driven approach: ` inserts ` in upstream tables auto trigger this step
108109
109- 2 . ** Cleanup** (` purge_invalid_jobs ` ):
110- - Removes invalid or outdated jobs
110+ 2 . ** Cleanup** (` purge_jobs ` ):
111+ - Removes orphaned or outdated jobs
111112 - Should be run periodically to maintain consistency
112113 - More resource-intensive than scheduling
113114 - Example: Run daily during low-activity periods
114115 - Event-driven approach: ` deletes ` in upstream or target tables auto trigger this step
115116
116117The balance between these operations affects:
117118- How quickly new jobs are discovered and scheduled
118- - How long invalid jobs remain in the table
119+ - How long orphaned jobs remain in the table
119120- Database size and query performance
120121- Overall system responsiveness
121122
@@ -128,5 +129,5 @@ dj.config["min_scheduling_interval"] = 300 # 5 minutes
128129# (implement as a cron job or scheduled task)
129130def daily_cleanup ():
130131 for table in your_pipeline_tables:
131- table.purge_invalid_jobs ()
132+ table.purge_jobs ()
132133```
0 commit comments