Skip to content

fix(status,prompts): finish the #2432 residuals left by the merged #2580 family#2595

Merged
rysweet merged 4 commits into
mainfrom
fix/brain-eliminate-deterministic-fallbacks-1783182659
Jul 4, 2026
Merged

fix(status,prompts): finish the #2432 residuals left by the merged #2580 family#2595
rysweet merged 4 commits into
mainfrom
fix/brain-eliminate-deterministic-fallbacks-1783182659

Conversation

@rysweet

@rysweet rysweet commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Context: the core #2432 work was superseded by the merged #2580 family

This branch began as a full parallel implementation of #2432 (eliminate silent
deterministic fallbacks + fix "zero active engineers"). While it was in flight,
main merged a comprehensive solution to the same problem:

Those supersede this branch's parallel reasoner/dashboard implementation, so the
merge in this PR adopts main's version wholesale for all overlapping
production code and drops this branch's now-redundant zero_fallback_tests.rs
(main ships its own zero_fallback_2580_tests). No duplicate/competing
implementation is shipped.

What this PR now contains (the residual gaps #2580 left)

Main touched neither of these files, so both merge cleanly and are purely
additive:

1. status/provider.rs — finish design G4/G5 for the simard status surface

#2591 fixed the dashboard engineer count but left the status snapshot
(resources.live_engineers) on the buggy pgrep -f 'simard-engineer' (hyphen)
pattern — which never matches the real simard engineer run single-process …
(space) argv, so it undercounts. Read-only live-daemon evidence at reconcile time:

Signal Observed
real …/simard engineer run single-process … subprocesses present
live worktree dispatch claims (.simard-engineer-claim + live PID) 3 live
old status pattern pgrep 'simard-engineer' (hyphen) never matches the space argv

resources.live_engineers now derives from the authoritative
count_live_engineer_claims (the single source of truth, design G4) — matching
the dashboard surface — and the fragile pgrep pattern is retired (design G5).
Covered by a new test (live_engineers_derives_from_live_worktree_claims).

2. The four .md embedded-fallback prompts — consistency with #2588's parsers

#2588 updated the recipe YAMLs to emit structured output but left the .md
embedded fallbacks (used when the on-disk prompt is absent; also the RustyClawd
path) still mandating the DECISION: marker / "Do NOT output JSON" — which
now contradicts #2588's parsers that consume a {"decision": …} envelope.
Refreshed ooda_decide.md, ooda_brain.md, ooda_orient.md, and
merge_readiness_judge.md to mandate the fenced JSON envelope with a required
decision field, preserving all pinned sentences (STATUS: ACHIEVED gate, six
lifecycle variants, DECISION marker, churn/stuck-loop).

3. parse_failure.rs — structured tracing

Converted a residual eprintln! (the gh-issue-filed success branch) to
tracing::info!, per the structured-tracing-only directive.

Validation

  • Full cargo test --lib on the merged tree: 7031 passed, 0 failed (7 ignored
    are main's; the base_type_copilot integration tests skip when copilot is off
    PATH, as in CI).
  • cargo fmt --check, pre-commit clippy --release -D warnings, and pre-push
    clippy --all-targets --all-features --locked -D warnings + the
    memory_consolidation race-subset all pass — no --no-verify, no --admin.
  • git grep confirms no println!/eprintln! and no Bridge naming introduced.

Deploy

Does not touch the live daemon or ~/.simard (read-only inspection only). The
operator redeploys after merge.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

rysweet and others added 4 commits July 4, 2026 17:18
#2432 family)

Step 5e — design consolidation. Fold the four investigation threads
(parse-fail flow, extract.rs chokepoint coverage, distillation parser,
active-engineers telemetry) into one grounded target architecture for
eliminating silent deterministic fallbacks across the reasoning + telemetry
paths.

Grounded findings (file:line anchored in the doc):
- strip_recipe_noise sanitizer chokepoint is universally adopted (thread closed).
- The #2432 confidence-gated escalation ladder (run_brain_ladder, bounded by
  EscalationConfig: default 2 / hard cap 3) is wired for decide, orient,
  engineer-lifecycle, and merge-judge.
- Residual gaps: progress-checker is off the ladder (G1); distillation runs a
  parallel failure-class retry (G2); confidence.rs (verbalized confidence /
  self-consistency / ECE) is built but unwired (G3).
- Telemetry: three divergent live-engineer counts; count_live_engineers()
  greps "simard-engineer" (hyphen) but the real subprocess is "simard engineer"
  (space) — live 17 real vs 1 matched (G4/G5). Design promotes
  count_live_engineer_claims() to the single source of truth.

Adds the doc to docs/index.md and the mkdocs nav; mkdocs --strict and
scripts/verify-docs.sh both pass (15/0, 0 orphaned pages).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…2432)

Write failing tests FIRST that pin the target behaviour for eliminating
deterministic fallbacks in the Brain reasoners and fixing the dashboard
"zero active engineers" reading. Implementation lands in later steps; RED
contracts are #[ignore]d and un-ignored as each fix arrives.

GREEN locks (guard current good behaviour):
- retry recovers a real decision on schema-repair; retry budget bounded;
  exhaustion stays classified as a parse-failure (not a clean success)
- shared recipe_output chokepoint strips copilot banner+ANSI+logs and
  recovers the JSON payload; extractor consumes a {"decision":...} envelope
- every reasoner capture path routes through recipe_output:: (no bypass)
- active engineers = live (un-ended) subagent sessions; live worktree
  dispatch claims are counted
- distillation parses a banner/ANSI-polluted facts object
- ladder core introduces no legacy phase-adapter naming

RED (#[ignore], TDD-red until fix lands):
- ladder exhaustion must emit a dashboard-visible brain_parse_error metric,
  never a silent deterministic default
- a genuine take-no-action must be observably distinct from a parse-failure
- reasoner prompts must mandate a fenced JSON envelope with a `decision` field
- changed reasoner code must use structured tracing only (no stderr/stdout
  print macros)
- workboard active-engineers gauge must union live dispatch claims with the
  subagent registry (roots the "zero active engineers" defect)
- distillation must survive observed ~78%-failing capture shapes

Test sources avoid the literal print-macro / legacy-adapter tokens (built via
concat!) so the operator's git-grep contract cannot trip on the tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s telemetry (#2432)

Operator directive is ABSOLUTE: no silent deterministic fallback. This makes
every unrecoverable reasoner parse-miss an EXPLICIT, dashboard-visible error and
fixes the "zero active engineers" gauge. Implements the design in
docs/design/eliminate-deterministic-fallbacks.md and un-ignores the TDD contracts
in ooda_brain/zero_fallback_tests.rs.

No silent default on ladder exhaustion (Contracts 1 + 7):
- run_brain_ladder now emits a loud `brain_parse_error` self-metric + structured
  tracing::error! when the bounded escalation ladder is EXHAUSTED with no
  parseable decision. The deterministic floor still returns (it is the correct
  safety net when no LLM is configured — design §6), but it can no longer be
  silent or mistaken for a real decision. A genuine take-no-action
  (parsed continue_skipping) emits NO error metric, so the two paths are provably
  distinct. Threaded a `phase` label so the metric attributes to its reasoner.
- The metric write is hermetic under cfg!(test): suppressed unless HOME is a
  build target/ dir, so `cargo test` never writes to the live ~/.simard.

Structured JSON-envelope decision contract (Contract 3):
- ROOT CAUSE: ooda_decide.md/ooda_brain.md mandated a `DECISION:` marker the
  RecipeBrain parsers reject (first word "DECISION:" → DefaultMalformed), a
  prompt/parser mismatch guaranteeing parse-fail→default.
- parse_action_outcome / parse_lifecycle_outcome now consume a fenced JSON
  envelope `{"decision": "<variant>", ...}` via the shared recipe_output
  chokepoint FIRST, falling back to the legacy first-word parse (backward
  compatible — the ladder GREEN locks still pass). The extractor reads the
  STRUCTURED decision field, not free-prose keyword-sniffing.
- The four reasoner prompts now mandate the fenced JSON envelope with a required
  `decision` field, preserving pinned sentences (STATUS: ACHIEVED gate, six
  lifecycle variants, DECISION marker, churn/stuck-loop).

Structured tracing only (Contract 8): removed all eprintln! from the reasoner
production paths (recipe_brain.rs, distillation.rs, parse_failure.rs); the paired
tracing calls already carry the same fields.

Zero active engineers — TELEMETRY defect, not a real stall (Contract 5):
- Read-only live-daemon evidence: 3 live worktree dispatch claims + real
  `simard engineer run single-process` subprocesses exist, yet status used a
  pgrep pattern `simard-engineer` (hyphen) that never matches the real
  `simard engineer` (space) argv (undercount), and the workboard gauge read only
  the subagent registry (which diverges from / is empty on cold-start).
- Promoted count_live_engineer_claims to the single source of truth (design G4):
  added live_engineer_claims(state_root); the workboard "Active Engineers" gauge
  now unions live dispatch claims with the subagent registry (dedup by PID) so an
  empty registry with a live engineer can never render 0; status
  resources.live_engineers now derives from the claim count and the buggy pgrep
  pattern is retired (design G5).

Tests: un-ignored all five zero_fallback_tests red contracts + the workboard
union red test; all pass. Full lib suite green except two pre-existing
base_type_copilot integration tests that require `copilot` off PATH (they skip in
CI). fmt + clippy --release -D warnings + memory_consolidation race-subset pass.

Operator redeploys after merge (this change does not touch the live daemon or
~/.simard).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#2591)

While this branch was in flight, main merged a comprehensive #2432 solution:
- #2588 eliminated deterministic-default reasoner outcomes on parse-failure
  (explicit Err + brain_parse_error at the caller; parsers consume
  {"decision":...}/{"adjusted_urgency":...}).
- #2591 reported the TRUE live-engineer set for the dashboard "Active Engineers"
  via a new operator_commands_dashboard::live_engineers module.

Those supersede this branch's parallel reasoner/dashboard implementation, so this
merge adopts main's version wholesale for the overlapping production code
(recipe_brain.rs, workboard.rs, distillation.rs, context.rs, ooda_brain/mod.rs,
recipe_merge_judge.rs) and drops this branch's now-redundant zero_fallback_tests.rs
(main ships its own zero_fallback_2580_tests).

This branch is reduced to the two genuine residual gaps the #2580 family left,
which merge cleanly (main touched neither file):
- status/provider.rs: `resources.live_engineers` still used the buggy
  `pgrep 'simard-engineer'` (hyphen) pattern that never matches the real
  `simard engineer` (space) argv (design G5). Now derives from the authoritative
  count_live_engineer_claims (design G4), matching the dashboard surface. Covered
  by a new test.
- The four .md embedded-fallback prompts still mandated the `DECISION:` marker /
  "Do NOT output JSON", contradicting #2588's parsers that now consume a
  {"decision":...} envelope. Refreshed to the structured JSON-envelope contract
  (Contract 3), preserving pinned sentences.
- parse_failure.rs: convert a residual eprintln! (gh-issue-filed branch) to
  structured tracing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet changed the title fix(brain): make deterministic floors LOUD + fix zero-active-engineers telemetry (#2432) fix(status,prompts): finish the #2432 residuals left by the merged #2580 family Jul 4, 2026
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

📊 Coverage Summary

Generated by cargo llvm-cov --workspace --summary-only (nightly, excluding test files)

Module Lines Covered Coverage
Total 130505 108080 82.8%

Coverage data from CI run. Test files matching tests?/ are excluded from line counts.

@rysweet rysweet merged commit f8ae3af into main Jul 4, 2026
17 checks passed
@rysweet rysweet deleted the fix/brain-eliminate-deterministic-fallbacks-1783182659 branch July 4, 2026 20:19
rysweet added a commit that referenced this pull request Jul 4, 2026
The zero-fallback-reasoners narrative previously read as if every reasoner
already satisfied the no-deterministic-fallback contract. Verified against the
current tree, that is only partly true, so annotate each section with an honest
status marker (✅ enforced today / ⏳ required end state):

- Fix 1 (single sanitizing chokepoint, src/recipe_output/extract.rs): ✅
- Merge-judge verdict (src/stewardship/recipe_merge_judge.rs): parses
  {"verdict": …} JSON-first and fails closed to Verdict::Unclear, emitting
  brain_verdict_parsed_total{phase="merge_judge"}: ✅ (still a deterministic
  terminal outcome, not the propagated hard error the contract ultimately wants).
- OODA Decide / engineer-lifecycle (src/ooda_brain/recipe_brain.rs
  parse_action_outcome / parse_lifecycle_outcome): still first-word prose
  extraction that returns default_advance_goal() / default_continue_skipping()
  on DefaultEmpty / DefaultMalformed: ⏳
- DeterministicLifecycleBrain on-Err floor (src/ooda_brain/fallback.rs, selected
  by build_act_brain in daemon/brains.rs, logged "DEGRADED mode"): ⏳
- simard status count_live_engineers() (src/status/provider.rs) shells out to
  pgrep and is not registry-pinned, unlike the test-pinned Workboard gauge: ⏳

Every claim is cited to the file that backs it. Reframes the page header as the
spec to build to, not a description of shipped behaviour, matching the zero-BS
stance. The residual code fixes are tracked separately in PR #2595.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant