v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes by WZ · Pull Request #231 · WZ/dops-assistant

WZ · 2026-05-30T20:48:02Z

Summary

Deep mode (Step 3): a bounded, read-only re-examination of an investigation's hypothesis-loop conclusion. It resurrects a ruled-out cause or weakens a confirmed one using deeper read-only re-queries plus the Step 2 corroboration keystone.

Ships OFF by default. Today's bounded deep mode only re-judges the existing RCA's hypotheses — it can't yet investigate freely for the real cause (that's the planned autonomous orchestrator). Until that lands, the whole feature is gated behind config.agent.deepModeEnabled (default false):

server injects window.__DEEP_MODE_ENABLED__ only when enabled; the "Deep investigate" button is hidden otherwise
the deep_mode_investigate WS handler and the deep-from-start chain reject when off (server-authoritative, defense in depth)

Also in this PR: plain-English copy for the deep-mode stream + report (lead-with-takeaway phrasing), live "Testing hypotheses" progress in the Step 2 loop, and a dedicated structured deep-mode stream component.

Validation

Driven end-to-end on a real incident (a genuinely-down k8s deployment) over the live WebSocket:

Step-2 hypothesis loop ran on real evidence → confirmed a root cause.
Deep mode refute path fired: re-queried deeper in a widened window → found the confirming evidence absent → flipped the verdict. Persisted report.deepMode.outcome = confirmation-shaken.
Both gate states verified: off → button hidden + handler rejects ("Deep mode is not enabled"); on → global injected + handler runs the full stream.

Tests

tsc --noEmit clean.
Full suite: 2405 passing (172 files).
New: schema test locking the ships-OFF default; updated RcaReport copy assertions.

Pre-Landing Review

Adversarial review verdict: ship as-is. The gate is airtight (server-side enforcement in every path), no XSS in the injected global (static literal, no interpolation), and read-only is guaranteed by construction (filterToReadOnlyTools unconditional on the re-query path).

Two quality findings — must-fix before flipping deepModeEnabled on, but they do not block landing a default-OFF feature:

deepEvidenceCount is dropped from the deep-mode payload, so a total query failure could present "holds — deeper evidence backs it up" when zero checks actually succeeded. Surface the count (or a "0 checks" warning) before enabling.
The widened re-query window is labeled "Incident window" in the prompt for metric/log/infra predictions, which can pull baseline-period anomalies → spurious verdicts. Anchor the prompt to the original onset before enabling.

Follow-ups

CHANGELOG entry for 0.4.4.4 (deferred this run — to add).
The two before-enable findings above.
Autonomous orchestrator (design spec exists): the path to making deep mode genuinely useful — investigate for the real cause, not just re-judge the existing one.

🤖 Generated with Claude Code

Pure runDeepMode: re-examines the hypothesis loop's top ruled-out hypotheses with deeper injected evidence + the same discriminating keystone, resurrecting any the loop dismissed prematurely. Read-only by construction; gatherDeepEvidence injected so the control flow is unit-testable without LLM/MCP. 5 tests, tsc clean. Trigger/runner wiring + report/UI surface follow in subsequent commits.

deep_mode_investigate WS message (distinct from the deep_investigate chat msg) -> handler loads a completed investigation's stored hypotheses+ruledOut, rejoins predictions (matchRuledOutToPredictions), runs the deeper read-only re-query via deepModeReexamine (reuses investigation providers+model + hypothesis-requery), persists deepMode onto the report, streams progress. - rca-types: DeepModeReexamination/DeepModeReport + report.deepMode field - agents.ts: deepModeReexamine closure on createMastraAdapters - ws-types: deep_mode_investigate client msg + deep_mode:* server msgs - sanitize: DeepModeInvestigateMessageSchema; demo-mode blocks it - ws-handler: validation + dispatch + handleDeepModeInvestigate (read-only) - 3 new matchRuledOutToPredictions tests; tsc + deep-mode/sanitize/ws-handler green Next: deep-from-start flag (workflow tail) + React UI.

…tep 3) - 'Deep investigate' action on a completed investigation (shows only when the loop left ruled-out causes); sends deep_mode_investigate, reflects running/error state. - InvestigationPane handles deep_mode:started/complete/error → updates the report in place. - RcaReport renders a 'Deep Mode' section: resurrected-candidate (warning, a dismissed cause came back) vs rule-outs-confirmed (reassurance), with the per-hypothesis prior→deep verdict. - 3 render tests; tsc + clean web build green. Completes the on-demand trigger path end-to-end. Remaining: deep-from-start flag.

Two operator-visible signals for whether the synthesis loop is active: 1. Settings → LLM: read-only 'Hypothesis Loop' indicator — 'ON · N rounds' (teal) vs 'OFF · single-pass'. Surfaced via the existing /api/stacks/:id/llm/settings view (synthesisLoopRounds added). Deployment- level proof without firing an investigation. 2. Investigation metadata panel: a 'loop' MetaRow (outcome + ranked count) + a 'deep mode' MetaRow, shown only when those ran — per-investigation proof that synthesis wasn't single-pass. DESIGN.md-consistent (mono labels, restrained teal, rounded-sm). tsc + llm-settings/SettingsPage tests + clean web build green.

Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can be deployed/verified before #231 merges. Distinct tag from the released 0.4.4.3 (clean-main build). Not tagged :latest.

Deep mode jumped straight from button to result — the deep_mode:tool_call events were emitted but unrendered. Now route step-by-step progress through the chat reasoning channel (chat:stream_start → chat:stream_delta reasoning → chat:stream_end) the deep_investigate follow-up already uses, so it shows as a live 'thinking' block in the Console + a plain-language summary message. - deepModeReexamine: onProgress callback — announces each hypothesis as the loop reaches it ('↪ testing: X'), each re-query tool call, and the per- hypothesis verdict (resurrected vs still-ruled-out). - handleDeepModeInvestigate: streams onProgress/onToolCall as reasoning deltas; ends with an outcome summary. deep_mode:started/complete/error still drive the button + report update. - ws-types: dropped the now-unused deep_mode:tool_call. Reuses ChatPane's existing reasoning display — no new client rendering. tsc + ws-handler tests + clean web build green.

…le-outs) Gap: the button gated on ruledOut.length>0, so a cleanly-confirmed investigation (0 rule-outs) — or any report where rule-outs weren't persisted — showed no button at all. Now gate on loopOutcome present (the Step 2 loop ran → deep mode is applicable). The handler treats 'loop ran but ruled nothing out' as a calm Console message, not a red error; genuine single-pass (no hypotheses) stays an error.

Gap: the Approach-A draft showed a live Testing-hypotheses feed (rank → test leader → rule-out) but the loop ran silently inside Synthesis — only the final report showed rule-outs. Now runHypothesisLoop emits onRound progress events (ranking / testing / verdict); synthesis maps them to onIteration under the Synthesis phase, so the PhaseStepper streams them live while investigating — e.g. 'Testing H1/3: …', 'Ruled out H1: … (absent)', 'Confirmed H2: …'. Reuses the existing iteration-event rendering — no UI change. onRound is optional → loop stays pure; existing tests unaffected. +1 progress test. tsc + loop/deep-mode/investigation tests green.

Closes the 'deeper' gap: deep mode re-queried the same incident window the loop used, so it was 're-examine', not 'escalate'. Now widenTimeRange expands the window each side by max(duration, 30min) (~3x, centered) for the deep re-query, surfacing precursors/aftermath the narrow synthesis window missed. - widenTimeRange (pure, defensive: only widens parseable ISO/epoch; passes Grafana relative ranges + undefined through unchanged). 4 tests. - deepModeReexamine queries the widened window but keeps the ORIGINAL incident onset as the change-in-window anchor. Note: cross-service following (the other 'deeper' dimension) still pending — needs dependency topology. tsc + deep-mode tests green.

…igations The 2nd of the two trigger paths. New config flag agent.deepModeOnComplete: when on, an interactive (chat-dispatched) investigation that ran the loop and ruled causes out automatically chains the deep re-examination on completion — no second click; the result streams + lands in one pass. - config: agent.deepModeOnComplete (Zod, default off) - ws-handler: extracted runDeepModeStreamed() (shared by the on-demand trigger + the new chain); chat dispatch captures the report and chains when enabled - Settings → LLM: read-only 'Deep Mode — auto' indicator (ON / OFF · on-demand) alongside the Hypothesis Loop one, via the llm/settings view Scope: wired for the interactive path (where streaming matters); headless (webhook/poller) chaining is a follow-up. tsc + llm-settings/ws-handler/config tests + clean web build green.

The gap-#1 change over-corrected: gating on loopOutcome made the button appear on clean-confirm / no-rule-out investigations, where clicking dead-ends in 'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations with nothing to resurrect simply don't offer it (no dead-end button). (The earlier 'no button at all' was viewing pre-loop investigations, which correctly have no rule-outs either.)

… cause Deep mode was only useful when the loop ruled causes out, so the button kept being absent (pre-loop / clean confirms) or dead-ending. Now it re-examines the loop's conclusion by STANDING and always does something useful: - ruled-out causes -> try to RESURRECT (deeper evidence now satisfies?) - the confirmed cause (no rule-outs) -> try to REFUTE (deeper evidence drops support?) - deep-mode.ts: runDeepMode takes ReexamineTarget[] (priorStanding); flip logic per standing; outcomes resurrected-candidate | confirmation-shaken | holds | nothing-to-examine. buildReexamineTargets() picks resurrect vs refute mode. - rca-types/agents/ws-handler/RcaReport/InvestigationPane threaded through; button gates on loopOutcome again (always useful now); summaries + metadata row + report section handle shaken/holds; deep-from-start chains on any loop run. - tests rewritten for the standing API + refute mode + buildReexamineTargets. tsc + deep-mode/RcaReport/ws-handler/investigation tests + clean build green.

…xpanded) The deep-mode progress was piped through the chat 'thinking' block — collapsed by default, plain mono, no grouping. Replaced with a dedicated agent stream matching the design: - structured AgentStreamEvent (verb/target/status/indent) replaces plain text; deepModeReexamine emits onStep events, ws-handler streams deep_mode:step + a final stats footer (examined/tools/resurrected/shaken/elapsed). - new DeepModeStream component: status icons (◉/✓/✗), coral verbs, info-blue query targets, indented sub-steps with a left rail, always expanded. - InvestigationPane accumulates steps + renders it above the report. - dropped the chat:stream routing for deep mode (no more collapsed plain block). tsc + ws-handler/deep-mode tests + clean web build green.

…lish copy Deep mode (Step 3) is hidden from users until the autonomous orchestrator lands. Today's bounded re-examination only re-judges the existing RCA's hypotheses; it can't investigate freely for the real cause. Until it can, expose it only behind config.agent.deepModeEnabled (default false): - server injects window.__DEEP_MODE_ENABLED__ only when enabled; the 'Deep investigate' button is gated on it (hidden by default) - deep_mode_investigate WS handler + deep-from-start chain reject when off - reworded the deep-mode stream + report copy to plain English (lead-with-takeaway: 'Probably not the cause: … — the evidence that would confirm it isn't there') - schema test locks the ships-OFF default

WZ added 3 commits May 30, 2026 13:47

WZ changed the title ~~feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes [DRAFT]~~ feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes Jun 1, 2026

WZ added 10 commits June 1, 2026 11:13

chore(release): bump VERSION 0.4.4.3 → 0.4.4.4 (deep-mode preview)

02740f1

Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can be deployed/verified before #231 merges. Distinct tag from the released 0.4.4.3 (clean-main build). Not tagged :latest.

WZ marked this pull request as ready for review June 2, 2026 04:33

WZ changed the title ~~feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes~~ v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes Jun 2, 2026

WZ merged commit 75b1364 into main Jun 2, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes#231

v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes#231
WZ merged 14 commits into
mainfrom
feat/deep-mode

WZ commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

WZ commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Tests

Pre-Landing Review

Follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WZ commented May 30, 2026 •

edited

Loading