fix(status,prompts): finish the #2432 residuals left by the merged #2580 family#2595
Merged
rysweet merged 4 commits intoJul 4, 2026
Merged
Conversation
#2432 family) Step 5e — design consolidation. Fold the four investigation threads (parse-fail flow, extract.rs chokepoint coverage, distillation parser, active-engineers telemetry) into one grounded target architecture for eliminating silent deterministic fallbacks across the reasoning + telemetry paths. Grounded findings (file:line anchored in the doc): - strip_recipe_noise sanitizer chokepoint is universally adopted (thread closed). - The #2432 confidence-gated escalation ladder (run_brain_ladder, bounded by EscalationConfig: default 2 / hard cap 3) is wired for decide, orient, engineer-lifecycle, and merge-judge. - Residual gaps: progress-checker is off the ladder (G1); distillation runs a parallel failure-class retry (G2); confidence.rs (verbalized confidence / self-consistency / ECE) is built but unwired (G3). - Telemetry: three divergent live-engineer counts; count_live_engineers() greps "simard-engineer" (hyphen) but the real subprocess is "simard engineer" (space) — live 17 real vs 1 matched (G4/G5). Design promotes count_live_engineer_claims() to the single source of truth. Adds the doc to docs/index.md and the mkdocs nav; mkdocs --strict and scripts/verify-docs.sh both pass (15/0, 0 orphaned pages). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…2432) Write failing tests FIRST that pin the target behaviour for eliminating deterministic fallbacks in the Brain reasoners and fixing the dashboard "zero active engineers" reading. Implementation lands in later steps; RED contracts are #[ignore]d and un-ignored as each fix arrives. GREEN locks (guard current good behaviour): - retry recovers a real decision on schema-repair; retry budget bounded; exhaustion stays classified as a parse-failure (not a clean success) - shared recipe_output chokepoint strips copilot banner+ANSI+logs and recovers the JSON payload; extractor consumes a {"decision":...} envelope - every reasoner capture path routes through recipe_output:: (no bypass) - active engineers = live (un-ended) subagent sessions; live worktree dispatch claims are counted - distillation parses a banner/ANSI-polluted facts object - ladder core introduces no legacy phase-adapter naming RED (#[ignore], TDD-red until fix lands): - ladder exhaustion must emit a dashboard-visible brain_parse_error metric, never a silent deterministic default - a genuine take-no-action must be observably distinct from a parse-failure - reasoner prompts must mandate a fenced JSON envelope with a `decision` field - changed reasoner code must use structured tracing only (no stderr/stdout print macros) - workboard active-engineers gauge must union live dispatch claims with the subagent registry (roots the "zero active engineers" defect) - distillation must survive observed ~78%-failing capture shapes Test sources avoid the literal print-macro / legacy-adapter tokens (built via concat!) so the operator's git-grep contract cannot trip on the tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s telemetry (#2432) Operator directive is ABSOLUTE: no silent deterministic fallback. This makes every unrecoverable reasoner parse-miss an EXPLICIT, dashboard-visible error and fixes the "zero active engineers" gauge. Implements the design in docs/design/eliminate-deterministic-fallbacks.md and un-ignores the TDD contracts in ooda_brain/zero_fallback_tests.rs. No silent default on ladder exhaustion (Contracts 1 + 7): - run_brain_ladder now emits a loud `brain_parse_error` self-metric + structured tracing::error! when the bounded escalation ladder is EXHAUSTED with no parseable decision. The deterministic floor still returns (it is the correct safety net when no LLM is configured — design §6), but it can no longer be silent or mistaken for a real decision. A genuine take-no-action (parsed continue_skipping) emits NO error metric, so the two paths are provably distinct. Threaded a `phase` label so the metric attributes to its reasoner. - The metric write is hermetic under cfg!(test): suppressed unless HOME is a build target/ dir, so `cargo test` never writes to the live ~/.simard. Structured JSON-envelope decision contract (Contract 3): - ROOT CAUSE: ooda_decide.md/ooda_brain.md mandated a `DECISION:` marker the RecipeBrain parsers reject (first word "DECISION:" → DefaultMalformed), a prompt/parser mismatch guaranteeing parse-fail→default. - parse_action_outcome / parse_lifecycle_outcome now consume a fenced JSON envelope `{"decision": "<variant>", ...}` via the shared recipe_output chokepoint FIRST, falling back to the legacy first-word parse (backward compatible — the ladder GREEN locks still pass). The extractor reads the STRUCTURED decision field, not free-prose keyword-sniffing. - The four reasoner prompts now mandate the fenced JSON envelope with a required `decision` field, preserving pinned sentences (STATUS: ACHIEVED gate, six lifecycle variants, DECISION marker, churn/stuck-loop). Structured tracing only (Contract 8): removed all eprintln! from the reasoner production paths (recipe_brain.rs, distillation.rs, parse_failure.rs); the paired tracing calls already carry the same fields. Zero active engineers — TELEMETRY defect, not a real stall (Contract 5): - Read-only live-daemon evidence: 3 live worktree dispatch claims + real `simard engineer run single-process` subprocesses exist, yet status used a pgrep pattern `simard-engineer` (hyphen) that never matches the real `simard engineer` (space) argv (undercount), and the workboard gauge read only the subagent registry (which diverges from / is empty on cold-start). - Promoted count_live_engineer_claims to the single source of truth (design G4): added live_engineer_claims(state_root); the workboard "Active Engineers" gauge now unions live dispatch claims with the subagent registry (dedup by PID) so an empty registry with a live engineer can never render 0; status resources.live_engineers now derives from the claim count and the buggy pgrep pattern is retired (design G5). Tests: un-ignored all five zero_fallback_tests red contracts + the workboard union red test; all pass. Full lib suite green except two pre-existing base_type_copilot integration tests that require `copilot` off PATH (they skip in CI). fmt + clippy --release -D warnings + memory_consolidation race-subset pass. Operator redeploys after merge (this change does not touch the live daemon or ~/.simard). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#2591) While this branch was in flight, main merged a comprehensive #2432 solution: - #2588 eliminated deterministic-default reasoner outcomes on parse-failure (explicit Err + brain_parse_error at the caller; parsers consume {"decision":...}/{"adjusted_urgency":...}). - #2591 reported the TRUE live-engineer set for the dashboard "Active Engineers" via a new operator_commands_dashboard::live_engineers module. Those supersede this branch's parallel reasoner/dashboard implementation, so this merge adopts main's version wholesale for the overlapping production code (recipe_brain.rs, workboard.rs, distillation.rs, context.rs, ooda_brain/mod.rs, recipe_merge_judge.rs) and drops this branch's now-redundant zero_fallback_tests.rs (main ships its own zero_fallback_2580_tests). This branch is reduced to the two genuine residual gaps the #2580 family left, which merge cleanly (main touched neither file): - status/provider.rs: `resources.live_engineers` still used the buggy `pgrep 'simard-engineer'` (hyphen) pattern that never matches the real `simard engineer` (space) argv (design G5). Now derives from the authoritative count_live_engineer_claims (design G4), matching the dashboard surface. Covered by a new test. - The four .md embedded-fallback prompts still mandated the `DECISION:` marker / "Do NOT output JSON", contradicting #2588's parsers that now consume a {"decision":...} envelope. Refreshed to the structured JSON-envelope contract (Contract 3), preserving pinned sentences. - parse_failure.rs: convert a residual eprintln! (gh-issue-filed branch) to structured tracing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📊 Coverage Summary
Coverage data from CI run. Test files matching |
rysweet
added a commit
that referenced
this pull request
Jul 4, 2026
The zero-fallback-reasoners narrative previously read as if every reasoner
already satisfied the no-deterministic-fallback contract. Verified against the
current tree, that is only partly true, so annotate each section with an honest
status marker (✅ enforced today / ⏳ required end state):
- Fix 1 (single sanitizing chokepoint, src/recipe_output/extract.rs): ✅
- Merge-judge verdict (src/stewardship/recipe_merge_judge.rs): parses
{"verdict": …} JSON-first and fails closed to Verdict::Unclear, emitting
brain_verdict_parsed_total{phase="merge_judge"}: ✅ (still a deterministic
terminal outcome, not the propagated hard error the contract ultimately wants).
- OODA Decide / engineer-lifecycle (src/ooda_brain/recipe_brain.rs
parse_action_outcome / parse_lifecycle_outcome): still first-word prose
extraction that returns default_advance_goal() / default_continue_skipping()
on DefaultEmpty / DefaultMalformed: ⏳
- DeterministicLifecycleBrain on-Err floor (src/ooda_brain/fallback.rs, selected
by build_act_brain in daemon/brains.rs, logged "DEGRADED mode"): ⏳
- simard status count_live_engineers() (src/status/provider.rs) shells out to
pgrep and is not registry-pinned, unlike the test-pinned Workboard gauge: ⏳
Every claim is cited to the file that backs it. Reframes the page header as the
spec to build to, not a description of shipped behaviour, matching the zero-BS
stance. The residual code fixes are tracked separately in PR #2595.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context: the core #2432 work was superseded by the merged #2580 family
This branch began as a full parallel implementation of #2432 (eliminate silent
deterministic fallbacks + fix "zero active engineers"). While it was in flight,
main merged a comprehensive solution to the same problem:
(explicit
Err+brain_parse_errorat the caller; parsers now consume a{"decision": …}/{"adjusted_urgency": …}envelope).Engineers" panel via a new
operator_commands_dashboard::live_engineersmodule.Those supersede this branch's parallel reasoner/dashboard implementation, so the
merge in this PR adopts main's version wholesale for all overlapping
production code and drops this branch's now-redundant
zero_fallback_tests.rs(main ships its own
zero_fallback_2580_tests). No duplicate/competingimplementation is shipped.
What this PR now contains (the residual gaps #2580 left)
Main touched neither of these files, so both merge cleanly and are purely
additive:
1.
status/provider.rs— finish design G4/G5 for thesimard statussurface#2591 fixed the dashboard engineer count but left the status snapshot
(
resources.live_engineers) on the buggypgrep -f 'simard-engineer'(hyphen)pattern — which never matches the real
simard engineer run single-process …(space) argv, so it undercounts. Read-only live-daemon evidence at reconcile time:
…/simard engineer run single-process …subprocesses.simard-engineer-claim+ live PID)pgrep 'simard-engineer'(hyphen)resources.live_engineersnow derives from the authoritativecount_live_engineer_claims(the single source of truth, design G4) — matchingthe dashboard surface — and the fragile pgrep pattern is retired (design G5).
Covered by a new test (
live_engineers_derives_from_live_worktree_claims).2. The four
.mdembedded-fallback prompts — consistency with #2588's parsers#2588 updated the recipe YAMLs to emit structured output but left the
.mdembedded fallbacks (used when the on-disk prompt is absent; also the RustyClawd
path) still mandating the
DECISION:marker / "Do NOT output JSON" — whichnow contradicts #2588's parsers that consume a
{"decision": …}envelope.Refreshed
ooda_decide.md,ooda_brain.md,ooda_orient.md, andmerge_readiness_judge.mdto mandate the fenced JSON envelope with a requireddecisionfield, preserving all pinned sentences (STATUS: ACHIEVED gate, sixlifecycle variants, DECISION marker, churn/stuck-loop).
3.
parse_failure.rs— structured tracingConverted a residual
eprintln!(the gh-issue-filed success branch) totracing::info!, per the structured-tracing-only directive.Validation
cargo test --libon the merged tree: 7031 passed, 0 failed (7 ignoredare main's; the
base_type_copilotintegration tests skip whencopilotis offPATH, as in CI).
cargo fmt --check, pre-commitclippy --release -D warnings, and pre-pushclippy --all-targets --all-features --locked -D warnings+ thememory_consolidationrace-subset all pass — no--no-verify, no--admin.git grepconfirms noprintln!/eprintln!and noBridgenaming introduced.Deploy
Does not touch the live daemon or
~/.simard(read-only inspection only). Theoperator redeploys after merge.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com