[pull] dev from ArchiveBox:dev #71

pull · 2025-12-31T13:12:44Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

- Add _derive_persona_paths() in configset.py to automatically derive CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from ACTIVE_PERSONA when not explicitly set. This allows plugins to use these paths without knowing about the persona system. - Update chrome_utils.js launchChromium() to accept userDataDir option and pass --user-data-dir to Chrome. Also cleans up SingletonLock before launch. - Update killZombieChrome() to clean up SingletonLock files from all persona chrome_user_data directories after killing zombies. - Update chrome_cleanup() in misc/util.py to handle persona-based user data directories when cleaning up stale Chrome state. - Simplify on_Crawl__20_chrome_launch.bg.js to use CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from env (derived by get_config()). Config priority flow: ACTIVE_PERSONA=WorkAccount (set on crawl/snapshot) -> get_config() derives: CHROME_USER_DATA_DIR = PERSONAS_DIR/WorkAccount/chrome_user_data CHROME_EXTENSIONS_DIR = PERSONAS_DIR/WorkAccount/chrome_extensions -> hooks receive these as env vars without needing persona logic

Documents 7-phase refactoring to use machine.Process as the core data model for all subprocess management: - Phase 1: Add parent FK and process_type to Process model - Phase 2: Add lifecycle methods (launch, kill, poll, wait) - Phase 3: Update hook system to create Process records - Phase 4-5: Track workers/orchestrator/supervisord as Process - Phase 6: Create root Process on CLI invocation - Phase 7: Admin UI with tree visualization Enables full process hierarchy tracking from CLI → binary execution.

Key addition: Process.current() class method (like Machine.current()) that auto-creates/retrieves the Process record for the current OS process. Benefits: - Uses PPID lookup to find parent Process automatically - Detects process_type from sys.argv - Cached with validation (like Machine.current()) - Eliminates need for thread-local context management Simplified Phase 3 (workers) and Phase 4 (CLI) to just call Process.current() instead of manual Process creation.

PIDs are recycled by OS, so all Process queries now: - Filter by machine=Machine.current() (PIDs unique per machine) - Filter by started_at within PID_REUSE_WINDOW (24h) - Validate start time matches OS via psutil.Process.create_time() Added: - ProcessManager.get_by_pid() for safe PID lookups - Process.cleanup_stale_running() to mark orphaned RUNNING as EXITED - START_TIME_TOLERANCE (5s) for start time comparison - Uses psutil.Process.create_time() for accurate started_at

Phase 2 now includes line-by-line mapping of: - run_hook(): Create Process record, use Process.launch(), parse JSONL for child binary Process records - process_is_alive(): Accept Path or Process, use Process.is_alive() - kill_process(): Accept Path or Process, use Process.kill() - ArchiveResult.run(): Pass self.process as parent_process to run_hook() - ArchiveResult.update_from_output(): Read from Process.stdout/stderr - Snapshot.cleanup(): Kill via Process model, fallback to PID files - Snapshot.has_running_background_hooks(): Check via Process model Hook JSONL contract updated to support {"type": "Process"} records for tracking binary executions within hooks.

Phase 3.3 now includes: - Module-level _supervisord_db_process variable - start_new_supervisord_process(): Create Process record after Popen - stop_existing_supervisord_process(): Update Process status on shutdown - Process hierarchy diagram showing CLI → supervisord → workers chain Key insight: PPID-based linking works because workers call Process.current() in on_startup(), which finds supervisord's Process via PPID lookup.

New section 1.5 adds @Property proc that returns psutil.Process ONLY if: - PID exists in OS - OS start time matches our started_at (within tolerance) - We're on the same machine Safety features: - Validates start time via psutil.Process.create_time() - Optional command validation (binary name matches) - Returns None instead of wrong process on PID reuse Also adds convenience methods: - is_running: Check via validated psutil - get_memory_info(): RSS/VMS if running - get_cpu_percent(): CPU usage if running - get_children_pids(): Child PIDs from OS Updated kill() to use self.proc for safe killing - never kills a recycled PID since we validate start time first.

- Add comprehensive default CHROME_ARGS in config.json with 55+ flags for deterministic rendering, security, performance, and UI suppression - Update chrome_utils.js launchChromium() to read CHROME_ARGS and CHROME_ARGS_EXTRA from environment variables (set by get_config()) - Add getEnvArray() helper to parse JSON arrays or comma-separated strings from environment variables - Separate args into three categories: 1. baseArgs: Static flags from CHROME_ARGS config (configurable) 2. dynamicArgs: Runtime-computed flags (port, sandbox, headless, etc.) 3. extraArgs: User overrides from CHROME_ARGS_EXTRA - Add CHROME_SANDBOX config option to control --no-sandbox flag Args are now configurable via: - config.json defaults - ArchiveBox.conf file - Environment variables - Per-crawl/snapshot config overrides

- Create Persona class in personas/models.py for managing browser profiles/identities used for archiving sessions - Each Persona has: - chrome_user_data_dir: Chrome profile directory - chrome_extensions_dir: Installed extensions - cookies_file: Cookies for wget/curl - config_file: Persona-specific config overrides - Add Persona methods: - cleanup_chrome(): Remove stale SingletonLock/SingletonSocket files - get_config(): Load persona config from config.json - save_config(): Save persona config to config.json - ensure_dirs(): Create persona directory structure - all(): Iterator over all personas - get_active(): Get persona based on ACTIVE_PERSONA config - cleanup_chrome_all(): Clean up all personas - Update chrome_cleanup() in misc/util.py to use Persona.cleanup_chrome_all() instead of manual directory iteration - Add convenience functions: - cleanup_chrome_for_persona(name) - cleanup_chrome_all_personas()

- Remove standalone convenience functions (cleanup_chrome_for_persona, cleanup_chrome_all_personas) to reduce LOC - Change Persona.get_active(config) to accept config dict as argument instead of calling get_config() internally, since the caller needs to pass user/crawl/snapshot/archiveresult context for proper config

- Convert Persona from plain Python class to Django model with ModelWithConfig - Add config JSONField for persona-specific config overrides - Add get_derived_config() method that returns config with derived paths: - CHROME_USER_DATA_DIR, CHROME_EXTENSIONS_DIR, COOKIES_FILE, ACTIVE_PERSONA - Update get_config() to accept persona parameter in merge chain: get_config(persona=crawl.persona, crawl=crawl, snapshot=snapshot) - Remove _derive_persona_paths() - derivation now happens in Persona model - Merge order (highest to lowest priority): 1. snapshot.config 2. crawl.config 3. user.config 4. persona.get_derived_config() <- NEW 5. environment variables 6. ArchiveBox.conf file 7. plugin defaults 8. core defaults Usage: config = get_config(persona=crawl.persona, crawl=crawl) config['CHROME_USER_DATA_DIR'] # derived from persona

Comprehensive plan for implementing JSONL-based CLI piping: - Phase 1: Model prerequisites (ArchiveResult.from_json, tags_str fix) - Phase 2: Extract shared apply_filters() to cli_utils.py - Phase 3: Implement pass-through behavior for all create commands - Phase 4-6: Test infrastructure with pytest-django, unit/integration tests Key changes from original plan: - ArchiveResult.from_json() identified as missing prerequisite - Pass-through documented as new feature to implement - archivebox run updated to create-or-update pattern - conftest.py redesigned to use pytest-django with isolated tmp_path - Standardized on tags_str field name across all models - Reordered phases: implement before test

Multiple hooks in the same plugin directory were overwriting each other's stdout.log, stderr.log, hook.pid, and cmd.sh files. Now each hook uses filenames prefixed with its hook name: - on_Snapshot__20_chrome_tab.bg.stdout.log - on_Snapshot__20_chrome_tab.bg.stderr.log - on_Snapshot__20_chrome_tab.bg.pid - on_Snapshot__20_chrome_tab.bg.sh Updated: - hooks.py run_hook() to use hook-specific names - core/models.py cleanup and update_from_output methods - Plugin scripts to no longer write redundant hook.pid files

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

Multiple hooks in the same plugin directory were overwriting each other's stdout.log, stderr.log, hook.pid, and cmd.sh files. Now each hook uses filenames prefixed with its hook name: - on_Snapshot__20_chrome_tab.bg.stdout.log - on_Snapshot__20_chrome_tab.bg.stderr.log - on_Snapshot__20_chrome_tab.bg.pid - on_Snapshot__20_chrome_tab.bg.sh Updated: - hooks.py run_hook() to use hook-specific names - core/models.py cleanup and update_from_output methods - Plugin scripts to no longer write redundant hook.pid files  # Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk  --- ## Summary by cubic Prevented hook file collisions by giving each hook its own stdout, stderr, pid, and cmd filenames. This fixes mixed logs and ensures correct cleanup and status checks when multiple hooks run in the same plugin directory. - **Bug Fixes** - hooks.py: write hook-specific stdout/stderr/pid/cmd files and exclude them from new_files; derive cmd.sh from pid for safe kill. - core/models.py: read hook-specific logs; exclude hook output files when computing outputs; cleanup and background detection use *.pid. - Plugins: stop writing redundant hook.pid files; minor chrome utils cleanup. Written for commit 754b096. Summary will update on new commits.

Simplifies the comma-separated parsing logic to: - If value contains '[', parse as JSON array - Otherwise, parse as comma-separated values This prevents incorrect splitting of arguments containing internal commas when there's only one argument. For arguments with commas, users should use JSON format: CHROME_ARGS='["--arg1,val", "--arg2"]' Also exports getEnvArray in module.exports for consistency. Co-authored-by: Nick Sweeting <[email protected]>

…ling logic on model methods (#1734)  # Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk  --- ## Summary by cubic Added an implementation plan to centralize subprocess handling on the machine.Process model. It covers process hierarchy, Process.current(), safe lifecycle methods (launch/kill/wait), PID reuse protection, and phased changes across hooks, workers, CLI, migrations, and admin. Written for commit 3ae9410. Summary will update on new commits.

…#1735) Comprehensive plan for implementing JSONL-based CLI piping: - Phase 1: Model prerequisites (ArchiveResult.from_json, tags_str fix) - Phase 2: Extract shared apply_filters() to cli_utils.py - Phase 3: Implement pass-through behavior for all create commands - Phase 4-6: Test infrastructure with pytest-django, unit/integration tests Key changes from original plan: - ArchiveResult.from_json() identified as missing prerequisite - Pass-through documented as new feature to implement - archivebox run updated to create-or-update pattern - conftest.py redesigned to use pytest-django with isolated tmp_path - Standardized on tags_str field name across all models - Reordered phases: implement before test  # Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

This change consolidates duplicated logic between chrome_utils.js and extension installer hooks, as well as between Python plugin tests: JavaScript changes: - Add getExtensionsDir() to centralize extension directory path calculation - Add installExtensionWithCache() to handle extension install + cache workflow - Add CLI commands for new utilities - Refactor all 3 extension installers (ublock, istilldontcareaboutcookies, twocaptcha) to use shared utilities, reducing each from ~115 lines to ~60 - Update chrome_launch hook to use getExtensionsDir() Python test changes: - Add chrome_test_helpers.py with shared Chrome session management utilities - Refactor infiniscroll and modalcloser tests to use shared helpers - setup_chrome_session(), cleanup_chrome(), get_test_env() now centralized - Add chrome_session() context manager for automatic cleanup Net result: ~208 lines of code removed while maintaining same functionality.

- Update Crawl.output_dir_parent to use username instead of user_id for consistency with Snapshot paths - Add domain from first URL to Crawl path structure for easier debugging: users/{username}/crawls/YYYYMMDD/{domain}/{crawl_id}/ - Add CRAWL_OUTPUT_DIR to config passed to Snapshot hooks so chrome_tab can find the shared Chrome session from the Crawl - Update comment in chrome_tab hook to reflect new config source

- Update Crawl.output_dir_parent to use username instead of user_id for consistency with Snapshot paths - Add domain from first URL to Crawl path structure for easier debugging: users/{username}/crawls/YYYYMMDD/{domain}/{crawl_id}/ - Add CRAWL_OUTPUT_DIR to config passed to Snapshot hooks so chrome_tab can find the shared Chrome session from the Crawl - Update comment in chrome_tab hook to reflect new config source  # Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

This change consolidates duplicated logic between chrome_utils.js and extension installer hooks, as well as between Python plugin tests: JavaScript changes: - Add getExtensionsDir() to centralize extension directory path calculation - Add installExtensionWithCache() to handle extension install + cache workflow - Add CLI commands for new utilities - Refactor all 3 extension installers (ublock, istilldontcareaboutcookies, twocaptcha) to use shared utilities, reducing each from ~115 lines to ~60 - Update chrome_launch hook to use getExtensionsDir() Python test changes: - Add chrome_test_helpers.py with shared Chrome session management utilities - Refactor infiniscroll and modalcloser tests to use shared helpers - setup_chrome_session(), cleanup_chrome(), get_test_env() now centralized - Add chrome_session() context manager for automatic cleanup Net result: ~208 lines of code removed while maintaining same functionality.  # Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

- Add setup_test_env, launch_chromium_session, kill_chromium_session to chrome_test_helpers.py for extension tests - Add chromium_session context manager for cleaner test code - Refactor ublock, istilldontcareaboutcookies, twocaptcha tests to use shared helpers (~450 lines removed) - Refactor screenshot, dom, pdf tests to use shared get_test_env and get_lib_dir (~60 lines removed) - Net reduction: 228 lines of duplicate code

- Add get_machine_type() to chrome_test_helpers.py - Update get_test_env() to include MACHINE_TYPE - Refactor test_chrome.py to import from shared helpers - Removes ~50 lines of duplicate code

- Import shared Chrome test helpers - Add test_singlefile_with_chrome_session() to verify CDP connection - Add test_singlefile_disabled_skips() for config testing - Update existing test to use get_test_env()

New helpers in chrome_test_helpers.py: - get_plugin_dir(__file__) - get plugin dir from test file path - get_hook_script(dir, pattern) - find hook script by glob pattern - run_hook() - run hook script and return (returncode, stdout, stderr) - parse_jsonl_output() - parse JSONL from hook output - run_hook_and_parse() - convenience combo of above two - LIB_DIR, NODE_MODULES_DIR - lazy-loaded module constants - _LazyPath class for deferred path resolution Updated test files to use simpler patterns: - screenshot/tests/test_screenshot.py - dom/tests/test_dom.py - pdf/tests/test_pdf.py - singlefile/tests/test_singlefile.py Before: PLUGIN_DIR = Path(__file__).parent.parent After: PLUGIN_DIR = get_plugin_dir(__file__) Before: LIB_DIR = get_lib_dir(); NODE_MODULES_DIR = LIB_DIR / 'npm' / 'node_modules' After: from chrome_test_helpers import LIB_DIR, NODE_MODULES_DIR

Changed Snapshot.cleanup() to gracefully terminate background hooks: 1. Send SIGTERM to all background hook processes first 2. Wait up to each hook's plugin-specific timeout 3. Send SIGKILL only to hooks still running after their timeout Added graceful_terminate_background_hooks() function in hooks.py that: - Collects all .pid files from output directory - Validates process identity using mtime - Sends SIGTERM to all valid processes in phase 1 - Polls each process for up to its plugin-specific timeout - Sends SIGKILL as last resort if timeout expires - Returns status for each hook (sigterm/sigkill/already_dead/invalid)

- Add getMachineType, getLibDir, getNodeModulesDir, getTestEnv CLI commands to chrome_utils.js These are now the single source of truth for path calculations - Update chrome_test_helpers.py with call_chrome_utils() dispatcher - Add get_test_env_from_js(), get_machine_type_from_js(), kill_chrome_via_js() helpers - Update cleanup_chrome and kill_chromium_session to use JS killChrome - Remove unused Chrome binary search lists from singlefile hook (~25 lines) - Update readability, mercury, favicon, title tests to use shared helpers

Added 10 practical examples demonstrating the JSONL piping architecture: 1. Basic archive with auto-cascade 2. Retry failed extractions (by status, plugin, domain) 3. Pinboard bookmark import with jq 4. GitHub repo filtering with jq regex 5. Selective extraction (screenshots only) 6. Bulk tag management 7. Deep documentation crawling 8. RSS feed monitoring 9. Archive audit with jq aggregation 10. Incremental backup with diff Also added auto-cascade principle: `archivebox run` automatically creates Snapshots from Crawls and ArchiveResults from Snapshots, so intermediate commands are only needed for customization.

Extended graceful_terminate_background_hooks() to: - Reap processes with os.waitpid() to get exit codes - Write returncode to .returncode file for update_from_output() - Return detailed result dict with status, returncode, and pid Updated update_from_output() to: - Read .returncode and .stderr.log files - Determine status from returncode if no ArchiveResult JSONL record - Include stderr in output_str for failed hooks - Handle signal termination (negative returncodes like -9 for SIGKILL) - Clean up .returncode files along with other hook output files

- get_machine_type() matches JS getMachineType() - get_lib_dir() matches JS getLibDir() - get_node_modules_dir() matches JS getNodeModulesDir() - get_extensions_dir() matches JS getExtensionsDir() - find_chromium() matches JS findChromium() - kill_chrome() matches JS killChrome() - get_test_env() matches JS getTestEnv() All functions now try JS first (single source of truth) with Python fallback. Added backward compatibility aliases for old names.

# Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

- Trimmed from 10 to 8 focused examples - Emphasize CLI args for DB filtering (efficient), jq for transforms - Added key examples showing `run` emits JSONL enabling chained processing: - #4: Retry failed with different binary/timeout via jq transform - #8: Recursive link following (run → jq filter → crawl → run) - Removed redundant jq domain filtering (use --url__icontains instead) - Updated summary table with "Retry w/ Changes" and "Chain Processing" patterns

# Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk  --- ## Summary by cubic Switch background hook cleanup to a graceful termination flow using plugin-specific timeouts, only SIGKILLing if needed. This improves reliability and records accurate exit codes and stderr for better result reporting. - **Refactors** - Added graceful_terminate_background_hooks(): send SIGTERM to all hooks, wait per plugin timeout, SIGKILL remaining, reap with waitpid, write .returncode files. - Snapshot.cleanup() now uses merged config (get_config) to apply plugin-specific timeouts and terminate hooks gracefully. - update_from_output() reads .returncode and .stderr.log, infers status when no JSONL (handles signals like -9/-15), includes stderr on failures, and cleans up .returncode files. Written for commit 524e8e9. Summary will update on new commits.

# Summary  # Related issues  # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk

Phase 1: Model Prerequisites - Add ArchiveResult.from_json() and from_jsonl() methods - Fix Snapshot.to_json() to use tags_str (consistent with Crawl) Phase 2: Shared Utilities - Create archivebox/cli/cli_utils.py with shared apply_filters() - Update 7 CLI files to import from cli_utils.py instead of duplicating Phase 3: Pass-Through Behavior - Add pass-through to crawl create (non-Crawl records pass unchanged) - Add pass-through to snapshot create (Crawl records + others pass through) - Add pass-through to archiveresult create (Snapshot records + others) - Add create-or-update behavior to run command: - Records WITHOUT id: Create via Model.from_json() - Records WITH id: Lookup existing, re-queue - Outputs JSONL of all processed records for chaining Phase 4: Test Infrastructure - Create archivebox/tests/conftest.py with pytest-django fixtures - Include CLI helpers, output assertions, database assertions Phase 6: Config Update - Update supervisord_util.py: orchestrator -> run command This enables Unix-style piping: archivebox crawl create URL | archivebox run archivebox archiveresult list --status=failed | archivebox run curl API | jq transform | archivebox crawl create | archivebox run

This consolidates scattered subprocess management logic into the Process model: - terminate(): Graceful SIGTERM → wait → SIGKILL (replaces stop_worker, etc.) - kill_tree(): Kill process and all OS children (replaces os.killpg logic) - kill_children_db(): Kill DB-tracked child processes - get_running(): Query running processes by type (replaces get_all_worker_pids) - get_running_count(): Count running processes (replaces get_running_worker_count) - stop_all(): Stop all processes of a type - get_next_worker_id(): Get next worker ID for spawning Added Phase 8 to TODO documenting ~390 lines that can be deleted after consolidation, including workers/pid_utils.py which becomes obsolete. Also includes migration 0002 for parent FK and process_type fields.

DELETED: - workers/pid_utils.py (-192 lines) - replaced by Process model methods SIMPLIFIED: - crawls/models.py Crawl.cleanup() (80 lines -> 10 lines) - hooks.py: deleted process_is_alive() and kill_process() (-45 lines) UPDATED to use Process model: - core/models.py: Snapshot.cleanup() and has_running_background_hooks() - machine/models.py: Binary.cleanup() - workers/worker.py: Worker.on_startup/shutdown, get_running_workers, start - workers/orchestrator.py: Orchestrator.on_startup/shutdown, is_running All subprocess management now uses: - Process.current() for registering current process - Process.get_running() / get_running_count() for querying - Process.cleanup_stale_running() for cleanup - safe_kill_process() for validated PID killing Total line reduction: ~250 lines

Adds a new CLI command `archivebox pluginmap` that displays: - ASCII art diagrams of all core state machines (Crawl, Snapshot, ArchiveResult, Binary) - Lists all auto-detected on_Modelname_xyz hooks grouped by model/event - Shows hook execution order (step 0-9), plugin name, and background status Usage: archivebox pluginmap # Show all diagrams and hooks archivebox pluginmap -m Snapshot # Filter to specific model archivebox pluginmap -a # Include disabled plugins archivebox pluginmap -q # Output JSON only

Add comprehensive unit tests for the CLI piping architecture: - test_cli_crawl.py: crawl create/list/update/delete tests - test_cli_snapshot.py: snapshot create/list/update/delete tests - test_cli_archiveresult.py: archiveresult create/list/update/delete tests - test_cli_run.py: run command create-or-update and pass-through tests Extend tests_piping.py with: - TestPassThroughBehavior: tests for pass-through behavior in all commands - TestPipelineAccumulation: tests for accumulating records through pipeline All tests use pytest fixtures from conftest.py with isolated DATA_DIR.

- Add pwd validation in Process.launch() to prevent crashes - Fix psutil returncode handling (use wait() return value, not returncode attr) - Add None check for proc.pid in cleanup_stale_running() - Add stale process cleanup in Orchestrator.is_running() - Ensure orchestrator process_type is correctly set to ORCHESTRATOR - Fix KeyboardInterrupt handling (exit code 0 for graceful shutdown) - Throttle cleanup_stale_running() to once per 30 seconds for performance - Fix worker process_type to use TypeChoices.WORKER consistently - Fix get_running_workers() API to return list of dicts (not Process objects) - Only delete PID files after successful kill or confirmed stale - Fix migration index names to match between SQL and Django state - Remove db_index=True from process_type (index created manually) - Update documentation to reflect actual implementation - Add explanatory comments to empty except blocks - Fix exit codes to use Unix convention (128 + signal number) Co-authored-by: Nick Sweeting <[email protected]>

- Add prominent view mode switcher with List/Grid toggle buttons - Improve filter sidebar CSS with modern styling, rounded corners - Add live progress bar for in-progress snapshots showing hooks status - Show plugin icons only when output directory has content - Display archive result output_size sum from new field - Show hooks succeeded/total count in size column - Add get_progress_stats() method to Snapshot model - Add CSS for progress spinner and status badges - Update grid view template with progress indicator for archiving cards - Add tests for admin views, search, and progress stats

Resolved conflicts by keeping Process model changes and accepting dev changes for unrelated files. Ensured pid_utils.py remains deleted as intended by this PR. Co-authored-by: Nick Sweeting <[email protected]>

…anup, and migration - Fix Process.current() to store psutil cmdline instead of sys.argv for accurate validation - Fix worker process_type detection: explicitly set to WORKER after registration - Fix ArchiveResultWorker.start() to use Process.TypeChoices.WORKER consistently - Fix migration to be explicitly irreversible (SQLite doesn't support DROP COLUMN) - Fix get_running_workers() to return process_id instead of incorrectly named worker_id - Fix safe_kill_process() to wait for termination and escalate to SIGKILL if needed - Fix migration to include all indexes in state_operations (parent_id, process_type) - Fix documentation to use Machine.current() scoping and StatusChoices constants Co-authored-by: Nick Sweeting <[email protected]>

claude and others added 30 commits December 31, 2025 00:21

Apply suggestion from @cubic-dev-ai[bot]

f7b186d

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>

Update TODO_process_tracking.md

3ae9410

Merge branch 'dev' into claude/refactor-process-management-WcQyZ

dfe6841

fix cubic comments

84a4fb0

tweak comment

65b93d5

tweak comment

29eb628

Refactor test_chrome.py to use shared helpers

ef92a99

- Add get_machine_type() to chrome_test_helpers.py - Update get_test_env() to include MACHINE_TYPE - Refactor test_chrome.py to import from shared helpers - Removes ~50 lines of duplicate code

Add Chrome CDP integration tests for singlefile

7d74dd9

- Import shared Chrome test helpers - Add test_singlefile_with_chrome_session() to verify CDP connection - Add test_singlefile_disabled_skips() for config testing - Update existing test to use get_test_env()

claude and others added 26 commits December 31, 2025 09:02

fix extensions dir paths add personas migration

3d8c62f

fix process health stats

1d15901

fix migrations

95d61b0

Add unit tests for JSONL CLI pipeline commands (Phase 5 & 6) (#1743)

575a595

Add pluginmap management command (#1742)

7dd2d65

Merge branch 'dev' into claude/refactor-process-management-WcQyZ

5121b0e

Resolved conflicts by keeping Process model changes and accepting dev changes for unrelated files. Ensured pid_utils.py remains deleted as intended by this PR. Co-authored-by: Nick Sweeting <[email protected]>

Delete pid_utils.py and migrate to Process model (#1741)

bdb3d94

Improve admin snapshot list/grid views with better UX (#1744)

bbbfffd

pull bot locked and limited conversation to collaborators Dec 31, 2025

pull bot added the ⤵️ pull label Dec 31, 2025

pull bot merged commit bbbfffd into CrazyForks:dev Dec 31, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] dev from ArchiveBox:dev #71

[pull] dev from ArchiveBox:dev #71

Uh oh!

pull bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[pull] dev from ArchiveBox:dev #71

[pull] dev from ArchiveBox:dev #71

Uh oh!

Conversation

pull bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull bot commented Dec 31, 2025 •

edited

Loading