@@ -310,37 +310,40 @@ archivebox/plugins/{plugin_name}/
310310## Implementation Checklist
311311
312312### Phase 1: Schema Migration ✅
313- - [ ] Add ` Snapshot.current_step ` (IntegerField 0-9, default=0)
314- - [ ] Add ` ArchiveResult.hook_name ` (CharField, nullable) - just filename
315- - [ ] Create migration: ` 0033_snapshot_current_step_archiveresult_hook_name .py`
313+ - [x ] Add ` Snapshot.current_step ` (IntegerField 0-9, default=0)
314+ - [x ] Add ` ArchiveResult.hook_name ` (CharField, nullable) - just filename
315+ - [x ] Create migration: ` 0034_snapshot_current_step .py`
316316
317- ### Phase 2: Core Logic Updates
318- - [ ] Add ` extract_step(hook_name) ` utility in ` archivebox/hooks.py `
317+ ### Phase 2: Core Logic Updates ✅
318+ - [x ] Add ` extract_step(hook_name) ` utility in ` archivebox/hooks.py `
319319 - Extract first digit from ` __XX_ ` pattern
320320 - Default to 9 for unnumbered hooks
321- - [ ] Update ` Snapshot.create_pending_archiveresults() ` in ` archivebox/core/models.py ` :
321+ - [x] Add ` is_background_hook(hook_name) ` utility in ` archivebox/hooks.py `
322+ - Check for ` .bg. ` in filename
323+ - [x] Update ` Snapshot.create_pending_archiveresults() ` in ` archivebox/core/models.py ` :
322324 - Discover all hooks (not plugins)
323325 - Create one AR per hook with ` hook_name ` set
324- - [ ] Update ` ArchiveResult.run() ` in ` archivebox/core/models.py ` :
326+ - [x ] Update ` ArchiveResult.run() ` in ` archivebox/core/models.py ` :
325327 - If ` hook_name ` set: run single hook
326328 - If ` hook_name ` None: discover all plugin hooks (existing behavior)
327- - [ ] Add ` Snapshot.advance_step_if_ready() ` method:
329+ - [x ] Add ` Snapshot.advance_step_if_ready() ` method:
328330 - Check if all foreground ARs in current step finished
329331 - Increment ` current_step ` if ready
330332 - Ignore background hooks (.bg) in completion check
331- - [ ] Integrate with ` SnapshotMachine.is_finished() ` in ` archivebox/core/statemachines.py ` :
333+ - [x ] Integrate with ` SnapshotMachine.is_finished() ` in ` archivebox/core/statemachines.py ` :
332334 - Call ` advance_step_if_ready() ` before checking if done
333335
334- ### Phase 3: Worker Coordination
335- - [ ] Update worker AR claiming query in ` archivebox/workers/worker.py ` :
336+ ### Phase 3: Worker Coordination ✅
337+ - [x ] Update worker AR claiming query in ` archivebox/workers/worker.py ` :
336338 - Filter: ` extract_step(ar.hook_name) <= snapshot.current_step `
337- - Note: May need to denormalize or use clever query since step is derived
338- - Alternative: Claim any AR in QUEUED state, check step in Python before processing
339+ - Claims ARs in QUEUED state, checks step in Python before processing
340+ - Orders by hook_name for deterministic execution within step
339341
340- ### Phase 4: Hook Renumbering
341- - [ ] Renumber hooks per renumbering map below
342- - [ ] Add ` .bg ` suffix to long-running hooks
343- - [ ] Test all hooks still work after renumbering
342+ ### Phase 4: Hook Renumbering ✅
343+ - [x] Renumber hooks per renumbering map below
344+ - [x] Add ` .bg ` suffix to long-running hooks (media, gallerydl, forumdl, papersdl)
345+ - [x] Move parse_ * hooks to step 7 (70-79)
346+ - [x] Test all hooks still work after renumbering
344347
345348## Migration Path
346349
@@ -353,25 +356,34 @@ No special migration needed:
353356
354357### Renumbering Map
355358
356- ** Current → New:**
357- ```
358- git/on_Snapshot__12_git.py → git/on_Snapshot__62_git.py
359- media/on_Snapshot__51_media.py → media/on_Snapshot__63_media.bg.py
360- gallerydl/on_Snapshot__52_gallerydl.py → gallerydl/on_Snapshot__64_gallerydl.bg.py
361- forumdl/on_Snapshot__53_forumdl.py → forumdl/on_Snapshot__65_forumdl.bg.py
362- papersdl/on_Snapshot__54_papersdl.py → papersdl/on_Snapshot__66_papersdl.bg.py
363-
364- readability/on_Snapshot__52_readability.py → readability/on_Snapshot__55_readability.py
365- mercury/on_Snapshot__53_mercury.py → mercury/on_Snapshot__56_mercury.py
366-
367- singlefile/on_Snapshot__37_singlefile.py → singlefile/on_Snapshot__50_singlefile.py
368- screenshot/on_Snapshot__34_screenshot.js → screenshot/on_Snapshot__51_screenshot.js
369- pdf/on_Snapshot__35_pdf.js → pdf/on_Snapshot__52_pdf.js
370- dom/on_Snapshot__36_dom.js → dom/on_Snapshot__53_dom.js
371- title/on_Snapshot__32_title.js → title/on_Snapshot__54_title.js
372- headers/on_Snapshot__33_headers.js → headers/on_Snapshot__55_headers.js
373-
374- wget/on_Snapshot__50_wget.py → wget/on_Snapshot__61_wget.py
359+ ** Completed Renames:**
360+ ```
361+ # Step 5: DOM Extraction (sequential, non-background)
362+ singlefile/on_Snapshot__37_singlefile.py → singlefile/on_Snapshot__50_singlefile.py ✅
363+ screenshot/on_Snapshot__34_screenshot.js → screenshot/on_Snapshot__51_screenshot.js ✅
364+ pdf/on_Snapshot__35_pdf.js → pdf/on_Snapshot__52_pdf.js ✅
365+ dom/on_Snapshot__36_dom.js → dom/on_Snapshot__53_dom.js ✅
366+ title/on_Snapshot__32_title.js → title/on_Snapshot__54_title.js ✅
367+ readability/on_Snapshot__52_readability.py → readability/on_Snapshot__55_readability.py ✅
368+ headers/on_Snapshot__33_headers.js → headers/on_Snapshot__55_headers.js ✅
369+ mercury/on_Snapshot__53_mercury.py → mercury/on_Snapshot__56_mercury.py ✅
370+ htmltotext/on_Snapshot__54_htmltotext.py → htmltotext/on_Snapshot__57_htmltotext.py ✅
371+
372+ # Step 6: Post-DOM Extraction (background for long-running)
373+ wget/on_Snapshot__50_wget.py → wget/on_Snapshot__61_wget.py ✅
374+ git/on_Snapshot__12_git.py → git/on_Snapshot__62_git.py ✅
375+ media/on_Snapshot__51_media.py → media/on_Snapshot__63_media.bg.py ✅
376+ gallerydl/on_Snapshot__52_gallerydl.py → gallerydl/on_Snapshot__64_gallerydl.bg.py ✅
377+ forumdl/on_Snapshot__53_forumdl.py → forumdl/on_Snapshot__65_forumdl.bg.py ✅
378+ papersdl/on_Snapshot__54_papersdl.py → papersdl/on_Snapshot__66_papersdl.bg.py ✅
379+
380+ # Step 7: URL Extraction (parse_* hooks moved from step 6)
381+ parse_html_urls/on_Snapshot__60_parse_html_urls.py → parse_html_urls/on_Snapshot__70_parse_html_urls.py ✅
382+ parse_txt_urls/on_Snapshot__62_parse_txt_urls.py → parse_txt_urls/on_Snapshot__71_parse_txt_urls.py ✅
383+ parse_rss_urls/on_Snapshot__61_parse_rss_urls.py → parse_rss_urls/on_Snapshot__72_parse_rss_urls.py ✅
384+ parse_netscape_urls/on_Snapshot__63_parse_netscape_urls.py → parse_netscape_urls/on_Snapshot__73_parse_netscape_urls.py ✅
385+ parse_jsonl_urls/on_Snapshot__64_parse_jsonl_urls.py → parse_jsonl_urls/on_Snapshot__74_parse_jsonl_urls.py ✅
386+ parse_dom_outlinks/on_Snapshot__40_parse_dom_outlinks.js → parse_dom_outlinks/on_Snapshot__75_parse_dom_outlinks.js ✅
375387```
376388
377389## Testing Strategy
0 commit comments