v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes#231
Merged
Conversation
Pure runDeepMode: re-examines the hypothesis loop's top ruled-out hypotheses with deeper injected evidence + the same discriminating keystone, resurrecting any the loop dismissed prematurely. Read-only by construction; gatherDeepEvidence injected so the control flow is unit-testable without LLM/MCP. 5 tests, tsc clean. Trigger/runner wiring + report/UI surface follow in subsequent commits.
deep_mode_investigate WS message (distinct from the deep_investigate chat msg) -> handler loads a completed investigation's stored hypotheses+ruledOut, rejoins predictions (matchRuledOutToPredictions), runs the deeper read-only re-query via deepModeReexamine (reuses investigation providers+model + hypothesis-requery), persists deepMode onto the report, streams progress. - rca-types: DeepModeReexamination/DeepModeReport + report.deepMode field - agents.ts: deepModeReexamine closure on createMastraAdapters - ws-types: deep_mode_investigate client msg + deep_mode:* server msgs - sanitize: DeepModeInvestigateMessageSchema; demo-mode blocks it - ws-handler: validation + dispatch + handleDeepModeInvestigate (read-only) - 3 new matchRuledOutToPredictions tests; tsc + deep-mode/sanitize/ws-handler green Next: deep-from-start flag (workflow tail) + React UI.
…tep 3) - 'Deep investigate' action on a completed investigation (shows only when the loop left ruled-out causes); sends deep_mode_investigate, reflects running/error state. - InvestigationPane handles deep_mode:started/complete/error → updates the report in place. - RcaReport renders a 'Deep Mode' section: resurrected-candidate (warning, a dismissed cause came back) vs rule-outs-confirmed (reassurance), with the per-hypothesis prior→deep verdict. - 3 render tests; tsc + clean web build green. Completes the on-demand trigger path end-to-end. Remaining: deep-from-start flag.
Two operator-visible signals for whether the synthesis loop is active: 1. Settings → LLM: read-only 'Hypothesis Loop' indicator — 'ON · N rounds' (teal) vs 'OFF · single-pass'. Surfaced via the existing /api/stacks/:id/llm/settings view (synthesisLoopRounds added). Deployment- level proof without firing an investigation. 2. Investigation metadata panel: a 'loop' MetaRow (outcome + ranked count) + a 'deep mode' MetaRow, shown only when those ran — per-investigation proof that synthesis wasn't single-pass. DESIGN.md-consistent (mono labels, restrained teal, rounded-sm). tsc + llm-settings/SettingsPage tests + clean web build green.
Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can be deployed/verified before #231 merges. Distinct tag from the released 0.4.4.3 (clean-main build). Not tagged :latest.
Deep mode jumped straight from button to result — the deep_mode:tool_call
events were emitted but unrendered. Now route step-by-step progress through
the chat reasoning channel (chat:stream_start → chat:stream_delta reasoning →
chat:stream_end) the deep_investigate follow-up already uses, so it shows as
a live 'thinking' block in the Console + a plain-language summary message.
- deepModeReexamine: onProgress callback — announces each hypothesis as the
loop reaches it ('↪ testing: X'), each re-query tool call, and the per-
hypothesis verdict (resurrected vs still-ruled-out).
- handleDeepModeInvestigate: streams onProgress/onToolCall as reasoning deltas;
ends with an outcome summary. deep_mode:started/complete/error still drive
the button + report update.
- ws-types: dropped the now-unused deep_mode:tool_call.
Reuses ChatPane's existing reasoning display — no new client rendering.
tsc + ws-handler tests + clean web build green.
…le-outs) Gap: the button gated on ruledOut.length>0, so a cleanly-confirmed investigation (0 rule-outs) — or any report where rule-outs weren't persisted — showed no button at all. Now gate on loopOutcome present (the Step 2 loop ran → deep mode is applicable). The handler treats 'loop ran but ruled nothing out' as a calm Console message, not a red error; genuine single-pass (no hypotheses) stays an error.
Gap: the Approach-A draft showed a live Testing-hypotheses feed (rank → test leader → rule-out) but the loop ran silently inside Synthesis — only the final report showed rule-outs. Now runHypothesisLoop emits onRound progress events (ranking / testing / verdict); synthesis maps them to onIteration under the Synthesis phase, so the PhaseStepper streams them live while investigating — e.g. 'Testing H1/3: …', 'Ruled out H1: … (absent)', 'Confirmed H2: …'. Reuses the existing iteration-event rendering — no UI change. onRound is optional → loop stays pure; existing tests unaffected. +1 progress test. tsc + loop/deep-mode/investigation tests green.
Closes the 'deeper' gap: deep mode re-queried the same incident window the loop used, so it was 're-examine', not 'escalate'. Now widenTimeRange expands the window each side by max(duration, 30min) (~3x, centered) for the deep re-query, surfacing precursors/aftermath the narrow synthesis window missed. - widenTimeRange (pure, defensive: only widens parseable ISO/epoch; passes Grafana relative ranges + undefined through unchanged). 4 tests. - deepModeReexamine queries the widened window but keeps the ORIGINAL incident onset as the change-in-window anchor. Note: cross-service following (the other 'deeper' dimension) still pending — needs dependency topology. tsc + deep-mode tests green.
…igations The 2nd of the two trigger paths. New config flag agent.deepModeOnComplete: when on, an interactive (chat-dispatched) investigation that ran the loop and ruled causes out automatically chains the deep re-examination on completion — no second click; the result streams + lands in one pass. - config: agent.deepModeOnComplete (Zod, default off) - ws-handler: extracted runDeepModeStreamed() (shared by the on-demand trigger + the new chain); chat dispatch captures the report and chains when enabled - Settings → LLM: read-only 'Deep Mode — auto' indicator (ON / OFF · on-demand) alongside the Hypothesis Loop one, via the llm/settings view Scope: wired for the interactive path (where streaming matters); headless (webhook/poller) chaining is a follow-up. tsc + llm-settings/ws-handler/config tests + clean web build green.
The gap-#1 change over-corrected: gating on loopOutcome made the button appear on clean-confirm / no-rule-out investigations, where clicking dead-ends in 'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations with nothing to resurrect simply don't offer it (no dead-end button). (The earlier 'no button at all' was viewing pre-loop investigations, which correctly have no rule-outs either.)
… cause Deep mode was only useful when the loop ruled causes out, so the button kept being absent (pre-loop / clean confirms) or dead-ending. Now it re-examines the loop's conclusion by STANDING and always does something useful: - ruled-out causes -> try to RESURRECT (deeper evidence now satisfies?) - the confirmed cause (no rule-outs) -> try to REFUTE (deeper evidence drops support?) - deep-mode.ts: runDeepMode takes ReexamineTarget[] (priorStanding); flip logic per standing; outcomes resurrected-candidate | confirmation-shaken | holds | nothing-to-examine. buildReexamineTargets() picks resurrect vs refute mode. - rca-types/agents/ws-handler/RcaReport/InvestigationPane threaded through; button gates on loopOutcome again (always useful now); summaries + metadata row + report section handle shaken/holds; deep-from-start chains on any loop run. - tests rewritten for the standing API + refute mode + buildReexamineTargets. tsc + deep-mode/RcaReport/ws-handler/investigation tests + clean build green.
…xpanded) The deep-mode progress was piped through the chat 'thinking' block — collapsed by default, plain mono, no grouping. Replaced with a dedicated agent stream matching the design: - structured AgentStreamEvent (verb/target/status/indent) replaces plain text; deepModeReexamine emits onStep events, ws-handler streams deep_mode:step + a final stats footer (examined/tools/resurrected/shaken/elapsed). - new DeepModeStream component: status icons (◉/✓/✗), coral verbs, info-blue query targets, indented sub-steps with a left rail, always expanded. - InvestigationPane accumulates steps + renders it above the report. - dropped the chat:stream routing for deep mode (no more collapsed plain block). tsc + ws-handler/deep-mode tests + clean web build green.
…lish copy Deep mode (Step 3) is hidden from users until the autonomous orchestrator lands. Today's bounded re-examination only re-judges the existing RCA's hypotheses; it can't investigate freely for the real cause. Until it can, expose it only behind config.agent.deepModeEnabled (default false): - server injects window.__DEEP_MODE_ENABLED__ only when enabled; the 'Deep investigate' button is gated on it (hidden by default) - deep_mode_investigate WS handler + deep-from-start chain reject when off - reworded the deep-mode stream + report copy to plain English (lead-with-takeaway: 'Probably not the cause: … — the evidence that would confirm it isn't there') - schema test locks the ships-OFF default
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deep mode (Step 3): a bounded, read-only re-examination of an investigation's hypothesis-loop conclusion. It resurrects a ruled-out cause or weakens a confirmed one using deeper read-only re-queries plus the Step 2 corroboration keystone.
Ships OFF by default. Today's bounded deep mode only re-judges the existing RCA's hypotheses — it can't yet investigate freely for the real cause (that's the planned autonomous orchestrator). Until that lands, the whole feature is gated behind
config.agent.deepModeEnabled(defaultfalse):window.__DEEP_MODE_ENABLED__only when enabled; the "Deep investigate" button is hidden otherwisedeep_mode_investigateWS handler and the deep-from-start chain reject when off (server-authoritative, defense in depth)Also in this PR: plain-English copy for the deep-mode stream + report (lead-with-takeaway phrasing), live "Testing hypotheses" progress in the Step 2 loop, and a dedicated structured deep-mode stream component.
Validation
Driven end-to-end on a real incident (a genuinely-down k8s deployment) over the live WebSocket:
report.deepMode.outcome = confirmation-shaken.Tests
tsc --noEmitclean.Pre-Landing Review
Adversarial review verdict: ship as-is. The gate is airtight (server-side enforcement in every path), no XSS in the injected global (static literal, no interpolation), and read-only is guaranteed by construction (
filterToReadOnlyToolsunconditional on the re-query path).Two quality findings — must-fix before flipping
deepModeEnabledon, but they do not block landing a default-OFF feature:deepEvidenceCountis dropped from the deep-mode payload, so a total query failure could present "holds — deeper evidence backs it up" when zero checks actually succeeded. Surface the count (or a "0 checks" warning) before enabling.Follow-ups
0.4.4.4(deferred this run — to add).🤖 Generated with Claude Code