Skip to content

v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes#231

Merged
WZ merged 14 commits into
mainfrom
feat/deep-mode
Jun 2, 2026
Merged

v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes#231
WZ merged 14 commits into
mainfrom
feat/deep-mode

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented May 30, 2026

Summary

Deep mode (Step 3): a bounded, read-only re-examination of an investigation's hypothesis-loop conclusion. It resurrects a ruled-out cause or weakens a confirmed one using deeper read-only re-queries plus the Step 2 corroboration keystone.

Ships OFF by default. Today's bounded deep mode only re-judges the existing RCA's hypotheses — it can't yet investigate freely for the real cause (that's the planned autonomous orchestrator). Until that lands, the whole feature is gated behind config.agent.deepModeEnabled (default false):

  • server injects window.__DEEP_MODE_ENABLED__ only when enabled; the "Deep investigate" button is hidden otherwise
  • the deep_mode_investigate WS handler and the deep-from-start chain reject when off (server-authoritative, defense in depth)

Also in this PR: plain-English copy for the deep-mode stream + report (lead-with-takeaway phrasing), live "Testing hypotheses" progress in the Step 2 loop, and a dedicated structured deep-mode stream component.

Validation

Driven end-to-end on a real incident (a genuinely-down k8s deployment) over the live WebSocket:

  • Step-2 hypothesis loop ran on real evidence → confirmed a root cause.
  • Deep mode refute path fired: re-queried deeper in a widened window → found the confirming evidence absent → flipped the verdict. Persisted report.deepMode.outcome = confirmation-shaken.
  • Both gate states verified: off → button hidden + handler rejects ("Deep mode is not enabled"); on → global injected + handler runs the full stream.

Tests

  • tsc --noEmit clean.
  • Full suite: 2405 passing (172 files).
  • New: schema test locking the ships-OFF default; updated RcaReport copy assertions.

Pre-Landing Review

Adversarial review verdict: ship as-is. The gate is airtight (server-side enforcement in every path), no XSS in the injected global (static literal, no interpolation), and read-only is guaranteed by construction (filterToReadOnlyTools unconditional on the re-query path).

Two quality findings — must-fix before flipping deepModeEnabled on, but they do not block landing a default-OFF feature:

  1. deepEvidenceCount is dropped from the deep-mode payload, so a total query failure could present "holds — deeper evidence backs it up" when zero checks actually succeeded. Surface the count (or a "0 checks" warning) before enabling.
  2. The widened re-query window is labeled "Incident window" in the prompt for metric/log/infra predictions, which can pull baseline-period anomalies → spurious verdicts. Anchor the prompt to the original onset before enabling.

Follow-ups

  • CHANGELOG entry for 0.4.4.4 (deferred this run — to add).
  • The two before-enable findings above.
  • Autonomous orchestrator (design spec exists): the path to making deep mode genuinely useful — investigate for the real cause, not just re-judge the existing one.

🤖 Generated with Claude Code

WZ added 3 commits May 30, 2026 13:47
Pure runDeepMode: re-examines the hypothesis loop's top ruled-out
hypotheses with deeper injected evidence + the same discriminating
keystone, resurrecting any the loop dismissed prematurely. Read-only by
construction; gatherDeepEvidence injected so the control flow is
unit-testable without LLM/MCP. 5 tests, tsc clean.

Trigger/runner wiring + report/UI surface follow in subsequent commits.
deep_mode_investigate WS message (distinct from the deep_investigate chat
msg) -> handler loads a completed investigation's stored hypotheses+ruledOut,
rejoins predictions (matchRuledOutToPredictions), runs the deeper read-only
re-query via deepModeReexamine (reuses investigation providers+model +
hypothesis-requery), persists deepMode onto the report, streams progress.

- rca-types: DeepModeReexamination/DeepModeReport + report.deepMode field
- agents.ts: deepModeReexamine closure on createMastraAdapters
- ws-types: deep_mode_investigate client msg + deep_mode:* server msgs
- sanitize: DeepModeInvestigateMessageSchema; demo-mode blocks it
- ws-handler: validation + dispatch + handleDeepModeInvestigate (read-only)
- 3 new matchRuledOutToPredictions tests; tsc + deep-mode/sanitize/ws-handler green

Next: deep-from-start flag (workflow tail) + React UI.
…tep 3)

- 'Deep investigate' action on a completed investigation (shows only when
  the loop left ruled-out causes); sends deep_mode_investigate, reflects
  running/error state.
- InvestigationPane handles deep_mode:started/complete/error → updates the
  report in place.
- RcaReport renders a 'Deep Mode' section: resurrected-candidate (warning,
  a dismissed cause came back) vs rule-outs-confirmed (reassurance), with the
  per-hypothesis prior→deep verdict.
- 3 render tests; tsc + clean web build green.

Completes the on-demand trigger path end-to-end. Remaining: deep-from-start flag.
@WZ WZ changed the title feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes [DRAFT] feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes Jun 1, 2026
WZ added 10 commits June 1, 2026 11:13
Two operator-visible signals for whether the synthesis loop is active:

1. Settings → LLM: read-only 'Hypothesis Loop' indicator — 'ON · N rounds'
   (teal) vs 'OFF · single-pass'. Surfaced via the existing
   /api/stacks/:id/llm/settings view (synthesisLoopRounds added). Deployment-
   level proof without firing an investigation.
2. Investigation metadata panel: a 'loop' MetaRow (outcome + ranked count) +
   a 'deep mode' MetaRow, shown only when those ran — per-investigation proof
   that synthesis wasn't single-pass.

DESIGN.md-consistent (mono labels, restrained teal, rounded-sm). tsc +
llm-settings/SettingsPage tests + clean web build green.
Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can
be deployed/verified before #231 merges. Distinct tag from the released
0.4.4.3 (clean-main build). Not tagged :latest.
Deep mode jumped straight from button to result — the deep_mode:tool_call
events were emitted but unrendered. Now route step-by-step progress through
the chat reasoning channel (chat:stream_start → chat:stream_delta reasoning →
chat:stream_end) the deep_investigate follow-up already uses, so it shows as
a live 'thinking' block in the Console + a plain-language summary message.

- deepModeReexamine: onProgress callback — announces each hypothesis as the
  loop reaches it ('↪ testing: X'), each re-query tool call, and the per-
  hypothesis verdict (resurrected vs still-ruled-out).
- handleDeepModeInvestigate: streams onProgress/onToolCall as reasoning deltas;
  ends with an outcome summary. deep_mode:started/complete/error still drive
  the button + report update.
- ws-types: dropped the now-unused deep_mode:tool_call.

Reuses ChatPane's existing reasoning display — no new client rendering.
tsc + ws-handler tests + clean web build green.
…le-outs)

Gap: the button gated on ruledOut.length>0, so a cleanly-confirmed
investigation (0 rule-outs) — or any report where rule-outs weren't
persisted — showed no button at all. Now gate on loopOutcome present (the
Step 2 loop ran → deep mode is applicable). The handler treats 'loop ran but
ruled nothing out' as a calm Console message, not a red error; genuine
single-pass (no hypotheses) stays an error.
Gap: the Approach-A draft showed a live Testing-hypotheses feed (rank → test
leader → rule-out) but the loop ran silently inside Synthesis — only the final
report showed rule-outs. Now runHypothesisLoop emits onRound progress events
(ranking / testing / verdict); synthesis maps them to onIteration under the
Synthesis phase, so the PhaseStepper streams them live while investigating —
e.g. 'Testing H1/3: …', 'Ruled out H1: … (absent)', 'Confirmed H2: …'.

Reuses the existing iteration-event rendering — no UI change. onRound is
optional → loop stays pure; existing tests unaffected. +1 progress test.
tsc + loop/deep-mode/investigation tests green.
Closes the 'deeper' gap: deep mode re-queried the same incident window the
loop used, so it was 're-examine', not 'escalate'. Now widenTimeRange expands
the window each side by max(duration, 30min) (~3x, centered) for the deep
re-query, surfacing precursors/aftermath the narrow synthesis window missed.

- widenTimeRange (pure, defensive: only widens parseable ISO/epoch; passes
  Grafana relative ranges + undefined through unchanged). 4 tests.
- deepModeReexamine queries the widened window but keeps the ORIGINAL incident
  onset as the change-in-window anchor.

Note: cross-service following (the other 'deeper' dimension) still pending —
needs dependency topology. tsc + deep-mode tests green.
…igations

The 2nd of the two trigger paths. New config flag agent.deepModeOnComplete:
when on, an interactive (chat-dispatched) investigation that ran the loop and
ruled causes out automatically chains the deep re-examination on completion —
no second click; the result streams + lands in one pass.

- config: agent.deepModeOnComplete (Zod, default off)
- ws-handler: extracted runDeepModeStreamed() (shared by the on-demand trigger
  + the new chain); chat dispatch captures the report and chains when enabled
- Settings → LLM: read-only 'Deep Mode — auto' indicator (ON / OFF · on-demand)
  alongside the Hypothesis Loop one, via the llm/settings view

Scope: wired for the interactive path (where streaming matters); headless
(webhook/poller) chaining is a follow-up. tsc + llm-settings/ws-handler/config
tests + clean web build green.
The gap-#1 change over-corrected: gating on loopOutcome made the button appear
on clean-confirm / no-rule-out investigations, where clicking dead-ends in
'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines
RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations
with nothing to resurrect simply don't offer it (no dead-end button).

(The earlier 'no button at all' was viewing pre-loop investigations, which
correctly have no rule-outs either.)
… cause

Deep mode was only useful when the loop ruled causes out, so the button kept
being absent (pre-loop / clean confirms) or dead-ending. Now it re-examines the
loop's conclusion by STANDING and always does something useful:
- ruled-out causes -> try to RESURRECT (deeper evidence now satisfies?)
- the confirmed cause (no rule-outs) -> try to REFUTE (deeper evidence drops support?)

- deep-mode.ts: runDeepMode takes ReexamineTarget[] (priorStanding); flip logic
  per standing; outcomes resurrected-candidate | confirmation-shaken | holds |
  nothing-to-examine. buildReexamineTargets() picks resurrect vs refute mode.
- rca-types/agents/ws-handler/RcaReport/InvestigationPane threaded through;
  button gates on loopOutcome again (always useful now); summaries + metadata
  row + report section handle shaken/holds; deep-from-start chains on any loop run.
- tests rewritten for the standing API + refute mode + buildReexamineTargets.
  tsc + deep-mode/RcaReport/ws-handler/investigation tests + clean build green.
…xpanded)

The deep-mode progress was piped through the chat 'thinking' block — collapsed
by default, plain mono, no grouping. Replaced with a dedicated agent stream
matching the design:
- structured AgentStreamEvent (verb/target/status/indent) replaces plain text;
  deepModeReexamine emits onStep events, ws-handler streams deep_mode:step + a
  final stats footer (examined/tools/resurrected/shaken/elapsed).
- new DeepModeStream component: status icons (◉/✓/✗), coral verbs, info-blue
  query targets, indented sub-steps with a left rail, always expanded.
- InvestigationPane accumulates steps + renders it above the report.
- dropped the chat:stream routing for deep mode (no more collapsed plain block).

tsc + ws-handler/deep-mode tests + clean web build green.
@WZ WZ marked this pull request as ready for review June 2, 2026 04:33
…lish copy

Deep mode (Step 3) is hidden from users until the autonomous orchestrator
lands. Today's bounded re-examination only re-judges the existing RCA's
hypotheses; it can't investigate freely for the real cause. Until it can,
expose it only behind config.agent.deepModeEnabled (default false):

- server injects window.__DEEP_MODE_ENABLED__ only when enabled; the
  'Deep investigate' button is gated on it (hidden by default)
- deep_mode_investigate WS handler + deep-from-start chain reject when off
- reworded the deep-mode stream + report copy to plain English
  (lead-with-takeaway: 'Probably not the cause: … — the evidence that
  would confirm it isn't there')
- schema test locks the ships-OFF default
@WZ WZ changed the title feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes v0.4.4.4 feat: deep mode (Step 3) — skeptical re-examination of ruled-out causes Jun 2, 2026
@WZ WZ merged commit 75b1364 into main Jun 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant