Skip to content

Phase 27: Watcher Service & User-Initiated Scan#59

Merged
SimplicityGuy merged 87 commits into
mainfrom
gsd/phase-27-watcher-service-user-initiated-scan
May 14, 2026
Merged

Phase 27: Watcher Service & User-Initiated Scan#59
SimplicityGuy merged 87 commits into
mainfrom
gsd/phase-27-watcher-service-user-initiated-scan

Conversation

@SimplicityGuy
Copy link
Copy Markdown
Owner

Summary

Phase 27: Watcher Service & User-Initiated Scan
Goal: Each file server continuously streams new file arrivals to the application server, and the administrator can also trigger an explicit scan of any path on any agent from the admin UI.
Status: Verified βœ“ Β· Threat-secure βœ“ Β· UAT pass (3/3)

Ships the agent-side filesystem watcher and the operator-facing user-initiated scan flow. New phaze-agent-watcher (filesystem observer using watchdog) and phaze-agent-worker (SAQ consumer for the per-agent queue) services. The /pipeline/ admin UI gains a Trigger Scan card, an HTMX-polling scan-progress card with every 2s halt-on-terminal, and a Recent Scans mini-table. End-to-end: drop a file under the watcher's root β†’ settle for 10s β†’ FileRecord appears bound to the agent's LIVE sentinel batch; trigger a scan from the UI β†’ API enqueues scan_directory β†’ agent walks the tree β†’ chunked POST β†’ batch transitions to COMPLETED β†’ polling halts.

Changes

Plan 27-01: Watcher Foundation (Wave 0)

  • watchdog dep added; AgentSettings gains watcher knobs (settle period, debounce, stuck-file cap, polling-mode flag)
  • New phaze.tasks._shared.agent_bootstrap (whoami-with-retry, construct_agent_client)
  • Extended tests/test_task_split.py import-boundary tuple (forbids phaze.tasks.agent_worker from the watcher graph)

Plan 27-02: Wire Schemas (Wave 1)

  • FileUpsertChunk gains batch_id field
  • New ScanBatchPatch (Literal["running","completed","failed"] β€” LIVE excluded at the schema layer)
  • New ScanBatchResponse, ScanDirectoryPayload, TriggerScanForm
  • All four new schemas pin extra="forbid"

Plan 27-03: Controller HTTP API (Wave 2)

  • New PATCH /api/internal/agent/scan-batches/{batch_id} with 403-before-state-machine cross-tenant guard
  • POST /api/internal/agent/files accepts optional batch_id with the same 403-before-records-loop guard
  • New PhazeAgentClient.patch_scan_batch method
  • Same-state echoes as zero-DB-write 200; disallowed transitions return 409; LIVE in body returns 422

Plan 27-04: scan_directory Task (Wave 3)

  • phaze.tasks.scan.scan_directory(scan_path, batch_id) β€” chunked HTTP-only directory walk
  • os.walk(followlinks=False) per Pitfall 4; per-file OSError skip per D-12; NFC normalization on all path fields per Pitfall 3
  • Per-chunk PATCH with running counts; terminal PATCH to COMPLETED/FAILED
  • Registered in agent_worker.settings.functions

Plan 27-05: Watcher Package (Wave 3)

  • New phaze.agent_watcher: Debouncer (3600s stuck-file eviction), WatcherEventHandler (cross-thread bridge via call_soon_threadsafe), Poster (HTTP egress), __main__
  • 16+ unit tests including thread-bridge isolation, NFC normalization, stuck-file cap, OSError vanish

Plan 27-06: Admin UI (Wave 3)

  • New routers/pipeline_scans.py: POST /pipeline/scans, GET /pipeline/scans/{id} (HTMX poll partial), GET /pipeline/scans/agent-roots (HTMX swap target)
  • 6 new partial templates (trigger_scan_card, scan_path_picker, scan_progress_card, recent_scans_table, scan_status_pill, scan_submit_error)
  • 3-layer subpath traversal guard (NFC + .. rejection + prefix validation against agent.scan_roots)
  • dashboard.html extended with the Trigger Scan card and Recent Scans section

Plan 27-07: Deployment + Docs (Wave 5)

  • docker-compose.yml watcher service (:ro mount; restart: unless-stopped)
  • .env.example documents all required agent-mode vars + host/container hostname distinction
  • Per-service README at src/phaze/agent_watcher/README.md
  • STATE.md accumulated context for Phase 27

Requirements Addressed

REQ Description
DIST-02 Agent-side worker + watcher process model with HTTP-only boundary to the application server
SCAN-01 Continuous filesystem watching with settle-period gate
SCAN-02 Per-file ingestion: SHA-256 + metadata streamed to controller
SCAN-03 User-initiated scan via admin UI with progress visibility
SCAN-04 Idempotent natural-key (agent_id, original_path) ingestion (no duplicates on re-walk)

Verification

  • Automated verification: 5/5 must-have observable truths verified (27-VERIFICATION.md)
  • Human UAT: 3/3 tests passed live on rancher-desktop / linux-arm64 (27-HUMAN-UAT.md)
    • Test 1 β€” End-to-end file drop β†’ FileRecord under LIVE batch
    • Test 2 β€” Admin UI scan trigger β†’ progress polling β†’ terminal halt
    • Test 3 β€” Visual layout verification of admin UI
  • Security audit: threats_open: 0 β€” 20/24 mitigated, 4/24 accepted (27-SECURITY.md)
  • Test suite: 96 tasks tests + 149 router/agent_watcher/config tests pass on the post-UAT tree

UAT Gaps Closed During Live Bring-Up (14)

Gap One-liner
1 Phase 26 spillover: SAQ Worker rejected timeout/retries/keep_result kwargs
2/3 Auto-migrate + ensure_dev_agent at api startup (fresh-DB bring-up was broken)
4 .env.example missing required agent-mode vars + host/container guidance
5 Surface readable error on missing watcher env (was hidden behind ValidationError)
6 Watcher fresh-install quickstart in README
7 Watcher stdout logger (healthy vs hung was indistinguishable)
8 macOS bind-mount inotify gap β†’ PollingObserver mode
9 Seed LIVE-sentinel ScanBatch alongside dev-agent
10 Dev-seeder prefers PHAZE_AGENT_SCAN_ROOTS
11 Tailwind SRI hash mismatch + test env isolation
12 scan_progress 500 on tz-aware created_at (postgres TIMESTAMPTZ vs test schema divergence)
13 docker-compose missing agent-worker service (no SAQ consumer for phaze-agent-<agent_id>)
14 Dashboard 500 on tz-aware created_at (sibling of gap-12)

Each gap shipped as its own atomic fix(27-uat-gaps): gap-N commit with a regression test that would have caught the original bug. Gap-14 also lands an AST-based regression test that forbids the datetime.now(...).replace(tzinfo=None) antipattern in any router file.

Key Decisions

  • CSRF deferred to Phase 29: private-LAN single-operator deployment; no public exposure; documented as AR-27-01 in SECURITY.md.
  • Agent-worker service shipped now, not Phase 29: Phase 26 D-04 had scheduled the agent-side SAQ worker for docker-compose.agent.yml in Phase 29; UAT Test 2 surfaced that Phase 27 requires the consumer to demonstrate the user-initiated scan reaching COMPLETED. Added in gap-13.
  • elapsed_seconds is now a public shared helper: gap-14 promoted it from _elapsed_seconds in pipeline_scans.py and consolidated the inline duplicate in pipeline.dashboard. Backed by an AST-based antipattern test.
  • scan_directory chunks at 500 records/request (per Phase 27 D-08).
  • Schema-layer subpath validation is accept; router-layer is mitigate: legitimate subpaths need slashes/hyphens, so the regex would be over-restrictive; semantic validation belongs in the controller.

πŸ€– Generated with Claude Code

SimplicityGuy and others added 30 commits May 13, 2026 09:36
Phase 27 UI-SPEC.md: visual + interaction contracts for the Trigger Scan
card, Scan Path Picker HTMX swap, Scan Progress poll partial, Recent
Scans mini-table, and shared scan status pill. Inherits all design
tokens (color, type, spacing) from the existing Phaze design system in
templates/base.html and established class patterns; introduces no new
visual primitives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the technical foundation for Phase 27 by mapping each
CONTEXT.md decision to existing codebase patterns (PhazeAgentClient,
AgentTaskRouter, Phase 26 D-08 cross-tenant guard, HTMX poll-halt) and
surfacing the watchdog-asyncio thread bridge as the central pattern.
Documents mtime-stability landmines across rsync/cp/wget/editor write
patterns plus the Validation Architecture for Nyquist Dimension 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
STATE.md frontmatter said milestone: v3.0 but milestone_name was already
"Distributed Agents" (the v4.0 milestone). v3.0 (Cross-Service Intelligence
& File Enrichment) shipped 2026-04-04. Fix the version to match the active
milestone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add watchdog>=4.0 to [project].dependencies (resolves to 6.0.0)
- Extend AgentSettings with four new fields per D-03 / D-11:
  watcher_settle_seconds (10), watcher_max_pending_seconds (3600),
  watcher_sweep_interval_seconds (2), scan_chunk_size (500)
- Each field wired via AliasChoices(PHAZE_WATCHER_* / PHAZE_SCAN_CHUNK_SIZE)
- Add parametrized defaults + env-var alias tests in test_config_role_split.py

Wave 0 foundation for Phase 27 watcher service.
- New phaze.tasks._shared.agent_bootstrap module exports
  _WHOAMI_BACKOFF_S, construct_agent_client, whoami_with_retry
- Refactor agent_worker.py to import from _shared (preserves _whoami_with_retry
  back-compat alias so internal call sites are unchanged)
- Tighten whoami_with_retry: short-circuit on AgentApiAuthError (Pitfall 7)
  with operator-actionable "auth invalid; check PHAZE_AGENT_TOKEN" hint and
  ERROR-level log; no backoff entries consumed before short-circuit
- T-27-04 mitigation: cleartext token never escapes construct_agent_client
- Update test_agent_startup_banner.py to patch construct_agent_client and
  the shared module's _WHOAMI_BACKOFF_S (test target moved with the function)
- Add 5 new tests in test_shared_agent_bootstrap.py covering all 4 behaviors
  + token-leak guard

D-17 import-boundary invariant: shared module imports only phaze.config,
phaze.services.agent_client, phaze.schemas.agent_identity (no Postgres stack).
- New tests/test_agent_watcher/ package marker + conftest with three
  fixtures (tmp_watcher_root, fake_clock, mock_api_client) so Plan 05 can
  write test_debouncer/test_observer/test_main with zero scaffolding cost
- Extend tests/test_task_split.py with two new subprocess-isolated cases:
  * test_agent_watcher_does_not_import_phaze_database -- Phase 27 D-22 +
    Pitfall 5 extension; forbidden tuple adds phaze.tasks.agent_worker
    (watcher uses asyncio.run, not SAQ; dragging in agent_worker would
    require PHAZE_AGENT_QUEUE). Skipped pre-Plan-05 via importlib.util
    find_spec predicate; becomes a hard gate when Plan 05 creates the
    phaze.agent_watcher package.
  * test_shared_bootstrap_stays_postgres_free -- enforces D-17 invariant
    on phaze.tasks._shared.agent_bootstrap; passes immediately (no DB stack
    in the import graph).

Existing agent_worker subprocess case continues to pass (no regression).
- 3 tasks complete (watchdog dep + AgentSettings knobs; shared agent
  bootstrap with Pitfall 7 short-circuit; test scaffolding + boundary tests)
- 14 plan-scoped tests pass + 1 conditional skip (waits for Plan 05)
- No regressions in existing test suite; all quality gates green
fileConfig() default disable_existing_loggers=True silently kills every
Python logger not listed in alembic.ini (only root/sqlalchemy/alembic).
After any test in tests/test_migrations/ runs, all phaze.* loggers are
disabled and pytest caplog cannot capture from them β€” which surfaced as
test_whoami_with_retry_short_circuits_on_auth_error failing only in the
full suite, never in isolation.

Why: same-process test pollution from migration tests breaks caplog for
any subsequent test that asserts on a logged message. Setting
disable_existing_loggers=False keeps the alembic logger config additive,
which is the canonical Alembic recommendation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extend FileUpsertChunk with batch_id: uuid.UUID | None = None
- When present, binds chunk to specific ScanBatch (Plan 03 wires resolver)
- When absent, controller resolves agent's LIVE sentinel batch
- Drop unused `from __future__ import annotations` (pydantic needs uuid at
  runtime to build the validator; bare runtime import matches sibling schemas)
- All 5 behaviors covered: default None, explicit UUID, non-UUID rejection,
  extra="forbid" preserved for unknown fields, JSON schema exposes uuid|null
- Phase 25/26 callers continue to validate (additive optional field)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New module phaze.schemas.agent_scan_batches:
  * ScanBatchPatch (request body): four optional fields
    (total_files, processed_files, status, error_message) with
    extra="forbid"; status is Literal["running","completed","failed"]
    β€” "live" is intentionally absent (D-10 schema-layer guard on the
    watcher's terminal sentinel state)
  * ScanBatchPatchResponse (echo body): full row echo per D-Discretion Β§4
    β€” saves the agent a follow-up GET; loose `status: str` mirrors the
    sibling ExecutionLogPatchResponse shape
- 9 tests cover: running/completed/failed acceptance, live + garbage
  rejection, optional-progress-counts, no-ge-on-ints, extra=forbid,
  empty-body validates, response-row-echo, JSON schema Literal alts
  exclude "live"
- Module is contract-only; Plan 03 wires the endpoint and cross-tenant
  guard (T-27-02 mitigation lives at the router layer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, D-06)

- Append ScanDirectoryPayload to phaze.schemas.agent_tasks (after
  ScanLiveSetPayload): three fields (scan_path, batch_id: uuid.UUID,
  agent_id) with extra="forbid". Carries the per-job snapshot for the
  agent's scan_directory SAQ task (D-23 invariant: agent never reads
  state back from the controller mid-job).
- New module phaze.schemas.pipeline_scans:
  * TriggerScanForm β€” operator-submitted form body for POST
    /pipeline/scans. Three fields (agent_id, scan_root, subpath=""),
    extra="forbid". Semantic validation (NFC + prefix + .. rejection)
    happens at the router layer (T-27-03 disposition).
- 9 new tests: 5 for ScanDirectoryPayload (minimal valid, non-UUID
  rejection, extra-forbid, field-set, no-models/current-path) + 4 for
  TriggerScanForm (default-empty subpath, explicit subpath, extra-forbid,
  required-fields). Existing invariant tests extended to include the new
  ScanDirectoryPayload class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks shipped:
- FileUpsertChunk.batch_id optional field (D-09)
- ScanBatchPatch + ScanBatchPatchResponse module (D-10; LIVE excluded
  at schema layer)
- ScanDirectoryPayload (D-14) + TriggerScanForm (D-06)

45 schema tests pass; 11 Phase 25/26 router tests pass (no regression);
ruff + ruff-format + mypy all clean. Four auto-fixes documented inline
(ruff TC003 false-positive resolved by dropping __future__ annotations
in agent_files.py; docstring grep collision adjusted; ruff I001 auto-
fixed import order; JSON-schema LIVE-exclusion promoted to standalone
test for D-10 regression coverage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…client method (D-10, T-27-01)

- New router phaze.routers.agent_scan_batches with handler ordering:
  404 (unknown) -> 403 (cross-tenant, T-27-01) -> 422 (status='live' via
  Literal) -> 200 idempotent same-state echo -> 409 (illegal transition) ->
  200 applied. Cross-tenant guard mirrors agent_proposals.py:62-76
  byte-for-byte so a leaked batch_id cannot be probed via 409 timing.
- _SCAN_TRANSITIONS dict gates RUNNING -> {COMPLETED, FAILED}; LIVE
  intentionally absent (watcher's terminal sentinel).
- Same-state PATCH with no other set fields is a zero-DB-write echo --
  matches Phase 26 D-08 invariant (no updated_at bump).
- PhazeAgentClient.patch_scan_batch inherits the tenacity retry funnel +
  AgentApiError hierarchy via _request; sends model_dump(exclude_unset=True)
  so default-None fields don't clobber server-side state.
- 11 router contract tests cover all branches; Test 9 specifically asserts
  403 (not 409) when agent B PATCHes agent A's COMPLETED batch -- proves
  the cross-tenant check runs BEFORE state-machine eval.
- 1 respx client test verifies URL, exclude_unset wire body, and response
  model validation.
…9/D-18/D-21, T-27-02)

- Insert a resolution block BEFORE the records loop in agent_files.upsert_files:
    * body.batch_id present  -> session.get(ScanBatch, id); 404 if missing;
      403 if batch.agent_id != caller.id (T-27-02). Mirrors the Phase 26
      D-08 cross-tenant guard placement byte-for-byte.
    * body.batch_id absent   -> SELECT ScanBatch.id WHERE agent_id=? AND
      status='live'. The Phase 24 partial UQ uq_scan_batches_agent_id_live
      guarantees exactly one row exists for any registered agent, so
      .scalar_one() is safe.
- Stamp `data["batch_id"] = resolved_batch_id` on every record alongside the
  existing AUTH-01 `agent_id` stamp; the existing upsert SET clause already
  copies excluded.batch_id, so the field flows through atomically.
- Auto-enqueue path is untouched -- SCAN-02 invariant preserved (Test 5
  verifies extract_file_metadata still fires for new INSERTs).
- 5 new contract tests cover all branches; Test 3 explicitly asserts ZERO
  FileRecord rows insert when a cross-tenant 403 fires (atomicity proof,
  T-27-02 mitigation).
- Existing test_agent_files.py fixture now seeds the LIVE sentinel
  (Phase 24 D-11 invariant; pre-Phase-27 fixtures pre-date the
  agent-registration side effect, so we add it here to keep the Phase 25/26
  contract behaviorally unchanged).
- Import phaze.routers.agent_scan_batches in main.py (alphabetic order
  between agent_proposals and agent_tracklists).
- app.include_router(agent_scan_batches.router) added in the Phase 26
  internal-agent block; Plan 06 will land pipeline_scans.router after.
- New non-async test test_router_registered_in_main_app asserts the path
  prefix /api/internal/agent/scan-batches is reachable on the production
  create_app() app (NOT just the smoke-app fixture) and that a PATCH
  method is bound there. This closes the wiring acceptance gap.
Records the Wave 2 controller landing: PATCH /scan-batches/{batch_id} +
batch_id resolution on POST /files + PhazeAgentClient.patch_scan_batch +
main.py wiring. 991 tests passing, no regression, 17 new tests, 3 atomic
commits, 0 non-trivial deviations from the agent_proposals.py mirror.
…(D-11..D-13)

Walks scan_path on the agent host, SHA-256s each known-extension file via
asyncio.to_thread, POSTs FileUpsertChunk(batch_id=...) of 500 records via
ctx['api_client'].upsert_files, and PATCHes the ScanBatch's processed_files
after each chunk. Terminal PATCH carries status='completed' + total_files=N
on a clean walk, or status='failed' + error_message on a missing scan_path
or AgentApiServerError after retries.

Mitigations encoded:
- Pitfall 3 (NFC drift): unicodedata.normalize("NFC", ...) applied to all
  three path fields (original_path, original_filename, current_path).
- Pitfall 4 (symlink traversal): os.walk(scan_root, followlinks=False).
- D-12 mid-walk OSError: per-file try/except logs a warning and continues.
- AUTH-01: scan_directory NEVER stamps agent_id or id -- the controller
  resolves both from the bearer token.
- D-13 + Phase 26 D-25: NO imports of phaze.database, phaze.models,
  phaze.services.ingestion, or sqlalchemy. Helper _classify duplicates
  the EXTENSION_MAP lookup logic so we avoid importing services.ingestion
  (which transitively imports phaze.models).

11 unit tests cover: extension filter, exact 500/500/1 chunking at 1001
files, monotonic per-chunk PATCH counts, terminal completed PATCH,
terminal failed PATCH on missing path, OSError skip, NFC normalization,
agent_id/id omission, batch_id propagation on every chunk, symlink
non-traversal, extra-kwargs ValidationError.

12th test (registration) is for Task 2 and is intentionally failing here.
Adds scan_directory to the SAQ worker's functions list so AgentTaskRouter
can enqueue it by name on the per-agent queue (Phase 27 D-13). Import
ordering follows alphabetic (scan_directory before scan_live_set).

Placed between scan_live_set and execute_approved_batch per 27-PATTERNS.md
line 642 -- keeps the scan-family tasks contiguous.

The Phase 26 D-25 import-boundary invariant (no phaze.database / phaze.models /
sqlalchemy in agent_worker's transitive import graph) is preserved: scan.py's
new scan_directory uses only phaze.config, phaze.constants, phaze.schemas.*,
phaze.services.hashing, and phaze.services.agent_client -- all Postgres-free.
Verified by tests/test_task_split.py::test_agent_worker_does_not_import_phaze_database.

The previously-deselected registration test
(tests/test_tasks/test_scan_directory.py::test_scan_directory_registered_in_agent_worker_settings)
now passes -- closes the 12th test in that file.
Wave 3 first task: implement the three asyncio-side primitives for the
always-on watcher.

- Debouncer: dict[str, _PendingEntry] state machine driven by
  time.monotonic(); touch() inserts/refreshes; sweep() returns ready
  paths after settle_period and evicts stuck paths after max_pending
  (D-02 cap, T-27-05 mitigation). Snapshot iteration via list(...) is
  the Pitfall-2 safe-mutation pattern.
- WatcherEventHandler: watchdog -> asyncio bridge. Subscribes to
  FileCreatedEvent + FileModifiedEvent only (D-01); filters by
  EXTENSION_MAP for MUSIC/VIDEO categories (SCAN-03); NFC-normalizes
  src_path (Pitfall 3); dispatches via loop.call_soon_threadsafe -- the
  only sanctioned cross-thread primitive (Pitfall 2). Accepts both str
  and bytes src_paths from watchdog with a graceful drop on undecodable
  byte sequences.
- Poster: chunk-of-1 POST adapter. Stats + SHA-256 off-loop via
  asyncio.to_thread; vanished-path OSError dropped at DEBUG (Pitfall 1);
  FileUpsertChunk(files=[record]) omits batch_id so the controller
  resolves the LIVE sentinel from the bearer token (D-18); all three
  path fields NFC-normalized; AgentApi{Client,Server,}Error all logged
  via logger.exception (never re-raised) so a single record failure
  cannot crash the sweep loop.

10 unit tests green (5 debouncer + 5 observer). Thread-bridge invariant
verified: test_event_handler_uses_call_soon_threadsafe asserts touch()
is NEVER invoked directly on the test thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SimplicityGuy and others added 23 commits May 13, 2026 18:35
SAQ 0.26.3's Worker.__init__ does not accept timeout, retries, or
keep_result -- they are per-Job settings. Passing them through the
settings dict broke `saq phaze.tasks.controller.settings` (and the
agent_worker equivalent) on a fresh docker compose stack with TypeError.

Drop the three keys from both settings dicts; preserve the project's
policy defaults (600s timeout, 4 retries, 3600s ttl) via a Queue-level
before_enqueue hook in phaze.tasks._shared.queue_defaults that applies
them only when the Job is still at its SAQ default (preserving
caller-supplied per-job overrides).

Regression tests:
- test_before_enqueue_applies_project_defaults
- test_before_enqueue_preserves_explicit_overrides
- test_controller_settings_construct_real_worker (would have caught the
  original TypeError -- now passes)
- test_agent_worker_settings_construct_real_worker (same, for the
  agent-side dict)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…artup

GAP-2 (migrations): the api lifespan only opened the engine for a SELECT 1
connectivity check -- it never ran `alembic upgrade head`. On a fresh
docker compose stack the agents/files tables did not exist and every
request 500'd. Wire `phaze.database.run_migrations` into the lifespan
BEFORE the engine SELECT 1 so the schema is at head before any router
runs. Idempotent + gated by the new `settings.auto_migrate` knob
(env: PHAZE_AUTO_MIGRATE, default true).

GAP-3 (seed dev agent): migration 012 seeds the legacy agent ONLY when
backfilling a populated v3.0 files table. On a fresh DB no agent exists,
so the watcher's /whoami returns 403 and the container restart-loops.
Add `phaze.services.agent_bootstrap.ensure_dev_agent` -- on an empty
agents table it seeds a single `dev-agent` row with a sha256'd bearer
(either operator-supplied via PHAZE_DEV_AGENT_TOKEN or freshly random).
The cleartext bearer is logged once at INFO so the operator can copy it
into the watcher's .env. Gated by `settings.dev_seed_agent`
(env: PHAZE_DEV_SEED_AGENT, default false).

Regression tests:
- test_run_migrations_invokes_alembic_upgrade_head
- test_run_migrations_is_idempotent
- test_run_migrations_skips_when_auto_migrate_false
- test_api_lifespan_runs_migrations_on_startup (verifies the call-order
  invariant: run_migrations BEFORE engine.begin BEFORE ensure_dev_agent)
- test_ensure_dev_agent_seeds_when_table_empty
- test_ensure_dev_agent_noop_when_agent_exists
- test_ensure_dev_agent_uses_env_token_when_set
- test_ensure_dev_agent_disabled_in_prod

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.env.example previously documented the four optional PHAZE_WATCHER_*
tunables but NOT the three required agent-mode vars
(PHAZE_AGENT_API_URL, PHAZE_AGENT_TOKEN, PHAZE_AGENT_SCAN_ROOTS) nor
the host-vs-container hostname distinction (postgres/redis service DNS
when in docker compose vs localhost when running on host via uv).

Add explicit sections for:
- Host vs Container hostname rule (callout at the top)
- Gap 2/3 bring-up knobs (PHAZE_AUTO_MIGRATE, PHAZE_DEV_SEED_AGENT,
  PHAZE_DEV_AGENT_TOKEN)
- Required agent-mode env vars with example values and operator notes

Regression tests:
- test_env_example_documents_all_required_agent_mode_vars
- test_env_example_documents_auto_migrate_and_dev_seed
- test_env_example_explains_host_vs_container

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the watcher died with a raw pydantic ValidationError stack
trace when PHAZE_AGENT_API_URL (or another required AgentSettings field)
was missing. The operator-facing Pitfall-7 hint
("auth invalid; check PHAZE_AGENT_TOKEN") emitted by whoami_with_retry
was never reached because the validator tripped first.

Wrap the get_settings() call in main() with try/except ValidationError.
On failure, emit one ERROR log per failed field (with the field name and
its mapped env-var name like PHAZE_AGENT_API_URL), log the full pydantic
exception at DEBUG for troubleshooting, then sys.exit(1) so docker
compose restart-cycles with a meaningful logline.

Regression test:
- test_main_logs_actionable_error_on_missing_env: monkeypatches env to
  remove PHAZE_AGENT_API_URL, asserts the ERROR log mentions the var by
  name AND uses the "missing"/"required" keywords AND exit code is 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent_watcher README documented env vars but lacked a sequenced
bring-up walkthrough. Operators bringing up a fresh docker compose stack
had to puzzle out the order of api startup + dev-agent seeding + token
copy + watcher startup themselves.

Add a "Fresh Install Quickstart" section that walks through the entire
flow end-to-end:
- copy .env.example, host-vs-container hostname rule
- enable PHAZE_DEV_SEED_AGENT, pick a token
- bring up postgres + redis, then api + worker (migrations + seeding
  happen automatically in the api lifespan)
- bring up the watcher and verify with `docker logs watcher`
- production checklist for disabling the dev-seed path

Docs only; no test required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After Phase 27 UAT Gap 2 / Gap 3 wired `run_migrations` and
`ensure_dev_agent` into the api lifespan, the pre-existing Phase 4 gap
tests (test_lifespan_creates_queue_on_startup and
test_lifespan_disconnects_queue_on_shutdown) started failing because
the lifespan now opens a real DB connection before reaching the
Queue/engine mocks. Patch the new entry points (run_migrations,
ensure_dev_agent, async_session) so these tests stay unit-level.

No behavioural change -- only test plumbing alignment with the new
lifespan order documented in test_main_lifespan.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the 6 commits, what each fixed, the regression test that would
have caught the original bug, and the auxiliary lifespan-test fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The initial Gap 3 fix used `count(*) > 0` to detect an "already populated"
agents table, but Migration 012 inserts a `legacy-application-server` row
with `revoked_at=NOW()` and `token_hash=NULL` as a marker. That row cannot
authenticate, so on a fresh DB the watcher still has no usable agent β€” but
the naive count check would no-op.

Refine the check to count USABLE agents (`revoked_at IS NULL AND token_hash
IS NOT NULL`). Production migrations from v3.0 data leave the legacy row
as a revoked marker; the dev-seeder now correctly seeds past it.

Test that would have caught this: the new
`test_ensure_dev_agent_seeds_past_revoked_legacy_marker` deletes the
tokenless conftest legacy, inserts a production-shaped revoked legacy,
then asserts ensure_dev_agent still seeds a usable dev-agent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 27's watcher runs via `asyncio.run(main())` and never goes through
uvicorn's logging configuration. Without an explicit handler, EVERY
logger.info/error/etc call in the watcher (startup banner, sweep
warnings, post failures, evictions) was swallowed β€” operators saw an
empty `docker logs phaze-watcher-1` even when the process was healthy
and posting files. A healthy watcher was indistinguishable from a hung
one.

Add `_configure_logging()` at the top of main() that attaches a single
stdout StreamHandler to the root logger and sets root level to INFO.
Idempotent: re-running adds no duplicate handler.

Test that would have caught this:
`test_configure_logging_attaches_stdout_handler` resets root handlers,
invokes the function, asserts exactly one stdout StreamHandler is
present and root level <= INFO. Also asserts idempotency via a second
invocation.

Surfaced during Phase 27 UAT live bringup β€” the watcher container was
"Up 38 seconds" with zero log lines, leaving us unable to tell whether
it was working or stuck. This is the seventh gap closed in the UAT loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
watchdog's native Observer relies on inotify on Linux, but macOS docker
bind mounts (rancher-desktop / Docker Desktop) do NOT propagate inotify
events through 9p/virtiofs. The watcher's Observer schedules without
error but never fires β€” files are visible inside the container but no
events reach the WatcherEventHandler.

Add an opt-in `PHAZE_WATCHER_POLLING_MODE` config that swaps the
native Observer for watchdog's PollingObserver. Native remains the
default so production Linux deployments keep their efficient inotify
backend; macOS devs running UAT via docker compose set the env var to
work around the bind-mount limitation.

Tests that would have caught the wiring bug:
- `test_main_uses_polling_observer_when_flag_set` asserts PollingObserver
  is constructed and native Observer is NOT touched when the flag is true.
- `test_main_uses_native_observer_by_default` asserts the default path
  uses the native Observer (no Polling).

.env.example documents the new knob with the macOS context. Surfaced
during Phase 27 UAT β€” eighth gap in the live-bringup loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migration 012 explicitly seeds a LIVE-sentinel ScanBatch for the
`legacy-application-server` agent so POST /api/internal/agent/files can
resolve `batch_id=None` via the `uq_scan_batches_agent_id_live` partial
unique index. The dev-seeder created the agent but skipped this step,
so the watcher's chunk-of-1 upserts hit `scalar_one()` and crashed with
`sqlalchemy.exc.NoResultFound: No row was found when one was required`.

Add a `ScanBatch(agent_id=dev-agent, scan_path='<watcher>', status='live')`
insert immediately after the Agent insert.

Test that would have caught this: extended
`test_ensure_dev_agent_seeds_when_table_empty` now asserts the LIVE
sentinel ScanBatch exists with the canonical `<watcher>` scan_path
marker after seeding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In docker-compose mode SCAN_PATH is the HOST filesystem path used as the
bind-mount source (e.g. /Users/Robert/phaze-watch-test), while
PHAZE_AGENT_SCAN_ROOTS is the IN-CONTAINER path the agent's watcher
walks (e.g. /data/music). The original seeder copied settings.scan_path
into the dev-agent's scan_roots column, which wrote the host path β€”
the watcher then tried to schedule a watchdog Observer on the host
path from inside the container and crashed with FileNotFoundError.

Read PHAZE_AGENT_SCAN_ROOTS directly from os.environ (comma-split,
matching AgentSettings._split_scan_roots). Fall back to
settings.scan_path only when the agent env var is unset.

Test that would have caught this:
test_ensure_dev_agent_uses_phaze_agent_scan_roots_env_when_set sets
both vars to different values and asserts the agent row gets the
PHAZE_AGENT_SCAN_ROOTS value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes:

1. **Tailwind SRI mismatch (Gap 11):** base.html pinned the Tailwind CDN
   URL to @4 (major-version-only). jsdelivr silently ships newer 4.x
   point releases under that URL, and the previously-pinned SRI hash
   stops matching. Browsers BLOCK script execution on SRI mismatch, so
   Tailwind never loads and the entire admin UI renders unstyled. Pin
   to @4.3.0 with a matching SRI computed against the current served
   content.

2. **Test env isolation:** the project's docker-compose .env now defines
   runtime overrides like PHAZE_WATCHER_POLLING_MODE=true and
   PHAZE_WATCHER_SETTLE_SECONDS=3. pydantic-settings reads .env files
   into every BaseSettings() construction, which silently changed
   which code path tests exercised. Add an autouse conftest fixture
   that points BaseSettings classes at env_file=None for the test
   session and delenv's known toggle vars so neither .env nor a
   developer's shell env can leak in.

Tests added (would have caught Gap 11):
- test_every_cdn_script_pins_a_specific_version β€” static check that
  SRI-protected URLs in base.html aren't unpinned (e.g. @4 alone).
- test_cdn_sri_hashes_match_served_content β€” network-using check that
  every pinned SRI hash matches what the CDN currently serves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Alpine v3 does NOT process `:class` on the <html> element unless <html>
carries an x-data directive (Alpine's scanner starts at <body>). The
previous binding `<html :class="$store.theme.dark ? 'dark' : ''">` was
silently inert: clicking the toggle button mutated `$store.theme.mode`
but the <html> .dark class was never added or removed afterward β€” the
visual theme was permanently stuck at whatever the pre-flash script
chose on initial load.

Fix:
- Drop the inert `:class` binding from <html>.
- Move dark-class application into a single function `_applyTheme(mode)`
  that flips `document.documentElement.classList.toggle('dark', ...)`
  directly.
- Call it from three places: the pre-flash IIFE (first paint), the
  Alpine store's `set()` method (toggle click), and a
  `prefers-color-scheme` media query change listener (OS-level switch
  while in `auto` mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Alembic-migrated postgres schema declares scan_batches.created_at as
TIMESTAMP WITH TIME ZONE, so asyncpg materializes it as a tz-aware
datetime. `_elapsed_seconds` did
`datetime.now(UTC).replace(tzinfo=None) - batch.created_at` which
crashed with `TypeError: can't subtract offset-naive and offset-aware
datetimes`. The scan_progress endpoint returned 500 and the admin UI's
polling card went blank during UAT Test 2.

Surfaced because the test suite hides the divergence β€” SQLAlchemy's
create_all generates TIMESTAMP WITHOUT TIME ZONE columns, so loaded
ScanBatch rows in tests were tz-naive and the subtraction worked.
Production schema differs from test schema.

Fix: compare aware-to-aware. If `created_at` is unexpectedly tz-naive
(test fixtures that bypass the DB), treat it as UTC so the helper
returns a meaningful value either way.

Tests that would have caught this (regardless of test/prod schema
divergence):
- `test_elapsed_seconds_handles_tz_aware_created_at` constructs an
  aware datetime in Python and calls the helper directly.
- `test_elapsed_seconds_handles_tz_naive_created_at_as_utc` covers the
  defensive fallback path so test-fixture loaders keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The user-initiated scan flow enqueues scan_directory + extract_file_metadata
onto the per-agent SAQ queue `phaze-agent-<agent_id>`, but Phase 27's
docker-compose.yml shipped only:
  - `worker`   β†’ controller queue (PHAZE_ROLE=control)
  - `watcher`  β†’ filesystem observer (no SAQ consumer)

so jobs sat in Redis with status="queued" forever. The UI's scan_progress
card polled correctly (gap-12 βœ“) but `total_files` stayed 0 and the card
never transitioned to COMPLETED -- breaking 27-UAT Test 2's "terminal halt"
contract.

Phase 26 D-04's comment scheduled the agent-side worker for Phase 29's
docker-compose.agent.yml overlay, but Phase 27 UAT requires it today.

Fix:
- New `agent-worker` service in docker-compose.yml running
  `uv run saq phaze.tasks.agent_worker.settings` with PHAZE_ROLE=agent.
  Binds to PHAZE_AGENT_QUEUE=phaze-agent-dev-agent (the dev seeder's
  agent_id). Will be parameterized in Phase 29.
- Defer essentia import in `phaze.tasks.functions`: move
  `from phaze.services.analysis import analyze_file` into a function-scoped
  loader. essentia-tensorflow is gated out of linux-arm64 by pyproject.toml's
  environment markers; the eager import made agent_worker's module load fail
  on Apple Silicon even though scan_directory / extract_file_metadata never
  touch essentia. process_file behavior on x86_64 is unchanged -- the loader
  is called at runtime when CPU-bound analysis is dispatched to the process
  pool.

Regression test (`tests/test_phase04_gaps.py`):
- `test_docker_compose_has_agent_worker_consuming_agent_queue` parses
  docker-compose.yml and asserts at least one service runs
  `saq phaze.tasks.agent_worker.settings` with PHAZE_ROLE=agent.

Live-verified on rancher-desktop / linux-arm64: scan_directory(batch_id=2b7e...)
went from status="queued" to status="complete" within seconds of the new
service coming up; GET /pipeline/scans/{id} now returns the COMPLETED
partial with no hx-trigger and no hx-get (polling halts as designed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g of gap-12)

Gap-12 patched `pipeline_scans._elapsed_seconds` to compare aware-to-aware
when computing elapsed scan time, but `pipeline.dashboard` (the Recent
Scans table loader) carried an inline duplicate of the pre-gap-12
antipattern:

    now = datetime.now(UTC).replace(tzinfo=None)
    batch._elapsed_seconds = int((now - batch.created_at).total_seconds())

The duplicate did not surface during gap-12 because the dashboard had no
non-LIVE ScanBatch rows to walk. Once gap-13 brought up the agent-worker
and the first user-initiated scan completed, the Recent Scans loop hit a
real tz-aware `created_at` from postgres and the entire dashboard route
500'd:

    TypeError: can't subtract offset-naive and offset-aware datetimes
      at src/phaze/routers/pipeline.py:157

User saw the page as "inaccessible" with an empty Recent Scans table.

Fix:
- Promote `_elapsed_seconds` β†’ `elapsed_seconds` in pipeline_scans.py
  (now a public shared helper). One definition, one tested behavior.
- `pipeline.dashboard` imports and calls `elapsed_seconds` instead of
  re-implementing the math inline.
- Drop the now-unused `datetime` / `UTC` imports in pipeline.py.

Regression test (`tests/test_routers/test_pipeline_scans.py`):
- `test_no_router_uses_tz_naive_now_antipattern` walks the router package
  AST and fails on any `datetime.now(...).replace(tzinfo=None)` pattern.
  Catches gap-12, gap-14, and any future sibling instance in one rule.
- Existing `_elapsed_seconds` tests updated to import `elapsed_seconds`
  (public rename).

Live-verified: GET /pipeline/ now returns 200 with the Recent Scans table
populated (1 row -- the completed dev-agent /data/music scan with the
green COMPLETED pill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing-up)

All three Phase 27 UAT tests are PASS:
  1. End-to-end file drop β†’ FileRecord under LIVE batch         (9 gaps closed)
  2. Admin UI scan trigger β†’ progress polling β†’ terminal halt   (1 gap closed: gap-13)
  3. Visual layout verification of admin UI                     (1 gap closed: gap-14)

The remaining 3 gaps (10-12) landed between Test 1 and Test 2 and closed
prerequisites for the polling-card behavior. Full gap inventory:

  gap-1   SAQ Worker kwargs (Phase 26 spillover)
  gap-2/3 Auto-migrate + ensure_dev_agent at api startup
  gap-4   .env.example required vars + host/container guidance
  gap-5   Surface readable error on missing watcher env
  gap-6   Watcher fresh-install quickstart
  gap-7   Watcher stdout logger
  gap-8   PollingObserver for macOS bind mounts
  gap-9   Seed LIVE-sentinel ScanBatch alongside dev-agent
  gap-10  Dev-seeder prefers PHAZE_AGENT_SCAN_ROOTS
  gap-11  Tailwind SRI mismatch + test env isolation
  gap-12  scan_progress 500 on tz-aware created_at
  gap-13  docker-compose missing agent-worker service
  gap-14  Dashboard 500 on tz-aware created_at (sibling of gap-12)

Each gap was committed atomically with a regression test that would have
caught the original bug. UAT status flipped from `testing` to `complete`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 27's verifier left status=human_needed because three of its checks
require a browser + live docker stack + real-time settle timer. All three
have now been performed and passed against the docker-compose bring-up
on rancher-desktop / linux-arm64 (see 27-HUMAN-UAT.md):

  1. End-to-end file drop β†’ FileRecord under LIVE batch (PASS, 9 gaps closed)
  2. Admin UI scan trigger β†’ polling β†’ terminal halt (PASS, gap-13 closed)
  3. Visual layout verification (PASS, gap-14 closed)

Promotes status human_needed β†’ pass. Unlocks /gsd-ship 27.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
threats_open: 0 β€” all 24 plan-time threats verified CLOSED or accepted.

gsd-security-auditor audited the post-UAT state of the branch (after the
14 `fix(27-uat-gaps):` commits) to confirm plan-time mitigations survived
the UAT churn unchanged. One mitigation was hardened during UAT (T-27-03
substring `if ".." in joined` upgraded to component-level
`PurePosixPath.parts` check per WR-01); all others verified by grep gates
+ live test invocation against the current tree.

Accepted risks documented:
  AR-27-01  T-27-07 CSRF deferred to Phase 29 (private-LAN single-operator)
  AR-27-02  Concurrent overlapping scans (idempotent UQ absorbs)
  AR-27-03  Watcher catch-up on startup (PROJECT.md scope lock for v4.0)
  AR-27-04  Dev-seed bearer cleartext in API logs (gap-3; intentional dev
            path, gated on empty agents table + dev_seed_agent=True; never
            triggers in production)

Closes the security gate ahead of /gsd-ship 27.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
STATE.md updated to mark Phase 27 as shipped. Milestone v4.0 progress
moves from 67% (4/6 phases, 26/33 plans) to 83% (5/6, 33/33 plans). Two
phases remain in v4.0 -- Phase 28 (Distributed Execution Dispatch) and
Phase 29 (Operational Hardening per CONTEXT Β§ Deferred Ideas).

PR: #59

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 99.81949% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/phaze/agent_watcher/__main__.py 98.86% 1 Missing ⚠️

πŸ“’ Thoughts on this report? Let us know!

SimplicityGuy and others added 3 commits May 14, 2026 12:19
Closes 16/46 Codecov-flagged uncovered lines surfaced on PR #59.

Lines closed:
  agent_watcher/observer.py:64,68-70,90       (5 lines) β†’ 100%
  agent_watcher/poster.py:94-99               (6 lines) β†’ 100%
  routers/agent_scan_batches.py:99            (1 line)  β†’ 100%
  tasks/_shared/agent_bootstrap.py:105-107    (3 lines) β†’ 100%
  tasks/scan.py:82                            (1 line)  β†’ 91.86%

New tests (9):
  test_event_handler_drops_empty_src_path
  test_event_handler_drops_path_when_fsdecode_raises
  test_event_handler_ignores_directories_in_on_modified
  test_post_one_swallows_agent_api_error_branches (parametrized: 4xx/5xx/catch-all)
  test_whoami_with_retry_short_circuits_on_auth_error_in_final_attempt
  test_resolve_chunk_size_falls_back_when_not_agent_settings
  test_defensive_live_409_when_literal_bypassed

Each test pins a defensive branch a future refactor might silently bypass:
the fsdecode failure log, the directory-event guard on on_modified, each
of the three AgentApi* drop paths in Poster, the Pitfall-7 hint surfacing
when a token rotates mid-bootstrap, the ControlSettings fallback for
_resolve_chunk_size, and the defensive LIVE-status 409.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lopes

Closes 12 more of the Codecov-flagged lines from PR #59. Two files
formerly at 90-92% now sit at 100%.

Lines closed:
  routers/pipeline_scans.py:120, 207, 255-260  (5 lines) β†’ 100%
  tasks/scan.py:212-225                        (7 lines) β†’ 100%

New tests (5):
  test_scan_directory_aborts_with_failed_patch_on_server_error
    β€” 5xx upsert during walk: abort, terminal failed-PATCH succeeds,
      return shape pinned to {status:"failed", reason:"controller_5xx"}.
  test_scan_directory_terminal_failed_patch_also_fails
    β€” same as above, but the terminal failed-PATCH ALSO 503s. Verifies
      the inner-except suppression: no second exception escapes, the
      return envelope still surfaces, and the "terminal failed-PATCH
      also failed" log message fires for triage.
  test_get_scan_progress_unknown_id_returns_404
    β€” GET /pipeline/scans/{unknown_uuid} β†’ 404 "scan batch not found".
  test_post_scans_prefix_mismatch_via_direct_handler_invocation
    β€” defensive prefix-check branch (line 207) is structurally
      unreachable under normal inputs because the literal-membership
      check dominates and well-formed joined paths always prefix-match.
      Monkeypatches unicodedata.normalize to rewrite the joined path
      out from under the predicate, simulating a hypothetical future
      normalization edge case. Pins the 400 envelope.
  test_post_scans_enqueue_failure_with_secondary_commit_also_failing
    β€” WR-06 inner-except: when Redis-down causes the enqueue to fail
      AND a Postgres-down kills the secondary commit, the handler
      MUST still return the 503 envelope (no 500 escape). Verifies
      the inner try/except log, the session.rollback() call, and the
      503 envelope copy.

Total Codecov gap progress: 28/46 lines closed across 3 commits (batch-A 16, batch-B 12). Remaining: agent_watcher/__main__.py 18 lines (sweep
loop, role guard, signal fallback, __name__ entry).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last batch of Codecov gaps from PR #59. agent_watcher/__main__.py
goes from 79.55% to 98.86% β€” only line 245 (the `if __name__ == "__main__":`
entry-point bootstrap) is left, and that's intractable to test directly.

Lines closed:
  __main__.py:105-118     sweep_loop full body + inner post-failure path
  __main__.py:114-115     outer-except wrapping sweep iteration failures
  __main__.py:163-164     wrong-role guard (PHAZE_ROLE != agent)
  __main__.py:196-201     NotImplementedError signal-handler fallback

New tests (4):
  test_sweep_loop_posts_ready_logs_evicted_then_exits
  test_sweep_loop_outer_except_swallows_sweep_failure
  test_main_raises_when_settings_is_not_agent_settings
  test_main_swallows_signal_handler_not_implemented

Codecov gap progress (PR #59):
  Initial   46 lines uncovered across 7 files (91.69% patch coverage)
  Batch-A   16 lines covered (4 files β†’ 100%)
  Batch-B   12 lines covered (2 more files β†’ 100%)
  Batch-C   17 lines covered (last file 79.55% β†’ 98.86%)
  Final     1 line uncovered (the __main__ entrypoint; intractable)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SimplicityGuy SimplicityGuy merged commit 4efb4a4 into main May 14, 2026
34 checks passed
@SimplicityGuy SimplicityGuy deleted the gsd/phase-27-watcher-service-user-initiated-scan branch May 14, 2026 19:47
SimplicityGuy added a commit that referenced this pull request May 14, 2026
Phase 27 (Watcher Service & User-Initiated Scan) merged into main on
2026-05-14 as commit 4efb4a4. Status flips shipped β†’ ready_to_plan so
the next phase (Phase 28 β€” Distributed Execution Dispatch) can be
planned.

Milestone v4.0 progress: 5/6 phases (83%), 33/33 plans complete. Phase
28 and Phase 29 remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant