Skip to content

Commit e3190c9

Browse files
committed
feat: add context recovery workflows and diagnostics (br-dzv)
1 parent 8a46a2a commit e3190c9

File tree

5 files changed

+143
-6
lines changed

5 files changed

+143
-6
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ All notable changes to this project are documented in this file.
3333
- Added `instructions/rules_team_pack_examples.md` with practical team rule-pack layout and sample rule files.
3434
- Added `instructions/context_resilience_policy_schema.md` defining truncation modes, protected artifacts, and resilience notification levels for Epic 11 Task 11.1.
3535
- Added `scripts/context_resilience.py` implementing resilience policy resolution and deterministic context pruning primitives.
36+
- Added recovery workflow planning in `scripts/context_resilience.py` with resume hints, safe fallback steps, and diagnostics payloads.
3637

3738
### Changes
3839
- Documented extension evaluation outcomes and when each tool is the better fit.
@@ -65,6 +66,7 @@ All notable changes to this project are documented in this file.
6566
- Added `/doctor` rules subsystem integration and expanded install/selftest coverage for rules command workflows.
6667
- Expanded rules verification to cover always-apply behavior, equal-priority lexical ordering, and richer discovery scenarios.
6768
- Expanded selftest coverage for context resilience policy validation and pruning behavior (dedupe, superseded writes, stale error purge, protected evidence retention).
69+
- Expanded selftest coverage for context recovery outcomes, including resume hints and fallback-path diagnostics.
6870

6971
## v0.2.0 - 2026-02-12
7072

IMPLEMENTATION_ROADMAP.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Use this map to avoid overlapping implementations.
4747
| E8 | Keyword-Triggered Execution Modes | done | High | E1, E4 | bd-302, bd-2fb, bd-2zq, bd-3dp | Fast power-mode activation from prompt text |
4848
| E9 | Conditional Rules Injector | done | High | E1 | bd-1q8, bd-3rj, bd-fo8, bd-2ik | Enforce project conventions with scoped rules |
4949
| E10 | Auto Slash Command Detector | paused | Medium | E1, E8 | TBD | Resume only if intent precision stays high in prototypes |
50-
| E11 | Context-Window Resilience Toolkit | in_progress | High | E4 | bd-2tj, bd-n9y | Improve long-session stability and recovery |
50+
| E11 | Context-Window Resilience Toolkit | in_progress | High | E4 | bd-2tj, bd-n9y, bd-2t0 | Improve long-session stability and recovery |
5151
| E12 | Provider/Model Fallback Visibility | planned | Medium | E5 | TBD | Explain why model routing decisions happen |
5252
| E13 | Browser Automation Profile Switching | planned | Medium | E1 | TBD | Toggle Playwright/agent-browser with checks |
5353
| E14 | Plan-to-Execution Bridge Command | planned | Medium | E2, E3 | TBD | Execute validated plans with progress tracking |
@@ -482,10 +482,11 @@ Every command-oriented epic must ship all of the following:
482482
- [x] Subtask 11.2.2: Add old-error input purge with turn thresholds
483483
- [x] Subtask 11.2.3: Preserve critical evidence and command outcomes
484484
- [x] Notes: Added `scripts/context_resilience.py` with policy resolution plus deterministic pruning (dedupe, superseded writes, stale error purge, budget trim) while preserving protected artifacts and latest command outcomes.
485-
- [ ] Task 11.3: Recovery workflows
486-
- [ ] Subtask 11.3.1: Add automatic resume hints after successful recovery
487-
- [ ] Subtask 11.3.2: Add safe fallback when recovery cannot proceed
488-
- [ ] Subtask 11.3.3: Add diagnostics for pruning/recovery actions
485+
- [x] Task 11.3: Recovery workflows
486+
- [x] Subtask 11.3.1: Add automatic resume hints after successful recovery
487+
- [x] Subtask 11.3.2: Add safe fallback when recovery cannot proceed
488+
- [x] Subtask 11.3.3: Add diagnostics for pruning/recovery actions
489+
- [x] Notes: Added recovery-plan generation in `scripts/context_resilience.py` with resume hints, safe fallback actions, and structured pruning/recovery diagnostics.
489490
- [ ] Task 11.4: Validation and docs
490491
- [ ] Subtask 11.4.1: Add stress tests for long-session behavior
491492
- [ ] Subtask 11.4.2: Add docs for tuning resilience settings

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -550,6 +550,7 @@ Engine behavior currently includes:
550550
- superseded write pruning (older writes to same target path)
551551
- stale error purging once newer successful command outcomes exist beyond threshold
552552
- preservation of protected artifacts and latest command outcomes as critical evidence
553+
- recovery planning with automatic resume hints, safe fallback steps, and pruning diagnostics
553554

554555
## Background jobs inside OpenCode 🧵
555556

scripts/context_resilience.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,3 +241,78 @@ def prune_context(
241241
"kept_count": len(kept_messages),
242242
"dropped_count": len(dropped),
243243
}
244+
245+
246+
def build_recovery_plan(
247+
original_messages: list[dict[str, Any]],
248+
pruned_report: dict[str, Any],
249+
policy: dict[str, Any],
250+
) -> dict[str, Any]:
251+
kept_messages = list(pruned_report.get("messages", []))
252+
drop_counts = dict(pruned_report.get("drop_counts", {}))
253+
254+
latest_success: dict[str, Any] | None = None
255+
latest_error: dict[str, Any] | None = None
256+
for idx, message in enumerate(kept_messages):
257+
exit_code = message.get("exit_code")
258+
if isinstance(exit_code, int) and exit_code == 0:
259+
latest_success = {
260+
"index": idx,
261+
"command": str(message.get("command", "")).strip(),
262+
"tool_name": str(message.get("tool_name", "")).strip(),
263+
}
264+
kind = str(message.get("kind", "")).strip().lower()
265+
if kind == "error":
266+
latest_error = {
267+
"index": idx,
268+
"command": str(message.get("command", "")).strip(),
269+
"content": str(message.get("content", "")).strip(),
270+
}
271+
272+
protected_retained = sum(
273+
1 for message in kept_messages if _is_protected(message, policy)
274+
)
275+
276+
if latest_success:
277+
command = latest_success.get("command") or "latest successful command"
278+
plan = {
279+
"can_resume": True,
280+
"recovery_action": "resume_hint",
281+
"resume_hint": f"Resume from the last successful step: `{command}`.",
282+
"fallback": None,
283+
}
284+
elif latest_error:
285+
failed_command = latest_error.get("command") or "last failed command"
286+
plan = {
287+
"can_resume": False,
288+
"recovery_action": "safe_fallback",
289+
"resume_hint": None,
290+
"fallback": {
291+
"reason": "no_successful_recovery_anchor",
292+
"steps": [
293+
"restore full context snapshot for the current workflow",
294+
f"re-run `{failed_command}` in isolation with explicit logging",
295+
"request operator review before applying any destructive edits",
296+
],
297+
},
298+
}
299+
else:
300+
plan = {
301+
"can_resume": True,
302+
"recovery_action": "resume_hint",
303+
"resume_hint": "Resume from the latest retained decision and rerun validation.",
304+
"fallback": None,
305+
}
306+
307+
return {
308+
**plan,
309+
"diagnostics": {
310+
"original_count": len(original_messages),
311+
"kept_count": len(kept_messages),
312+
"dropped_count": int(pruned_report.get("dropped_count", 0)),
313+
"drop_counts": drop_counts,
314+
"protected_retained_count": protected_retained,
315+
"notification_level": str(policy.get("notification_level", "normal")),
316+
"truncation_mode": str(policy.get("truncation_mode", "default")),
317+
},
318+
}

scripts/selftest.py

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,11 @@
2424
validate_schema,
2525
)
2626
from keyword_mode_schema import resolve_prompt_modes # type: ignore
27-
from context_resilience import prune_context, resolve_policy # type: ignore
27+
from context_resilience import ( # type: ignore
28+
build_recovery_plan,
29+
prune_context,
30+
resolve_policy,
31+
)
2832
from rules_engine import ( # type: ignore
2933
discover_rules,
3034
parse_frontmatter,
@@ -1955,6 +1959,60 @@ def run_bg(*args: str) -> subprocess.CompletedProcess[str]:
19551959
"context pruning should preserve latest command outcomes as critical evidence",
19561960
)
19571961

1962+
recovery_plan = build_recovery_plan(
1963+
context_messages, pruned_context, resilience_policy
1964+
)
1965+
expect(
1966+
recovery_plan.get("can_resume") is True,
1967+
"recovery plan should allow resume when success anchor exists",
1968+
)
1969+
expect(
1970+
recovery_plan.get("recovery_action") == "resume_hint",
1971+
"recovery plan should emit resume hints after successful recovery",
1972+
)
1973+
expect(
1974+
"make validate" in str(recovery_plan.get("resume_hint", "")),
1975+
"resume hint should reference latest successful command",
1976+
)
1977+
expect(
1978+
isinstance(recovery_plan.get("diagnostics", {}).get("drop_counts"), dict),
1979+
"recovery diagnostics should include pruning reason counts",
1980+
)
1981+
1982+
failed_only_messages = [
1983+
{
1984+
"role": "tool",
1985+
"tool_name": "bash",
1986+
"kind": "error",
1987+
"command": "make install-test",
1988+
"exit_code": 2,
1989+
"content": "missing dependency",
1990+
"turn": 1,
1991+
},
1992+
{
1993+
"role": "assistant",
1994+
"kind": "analysis",
1995+
"content": "investigate dependency mismatch",
1996+
"turn": 2,
1997+
},
1998+
]
1999+
failed_pruned = prune_context(failed_only_messages, resilience_policy)
2000+
failed_plan = build_recovery_plan(
2001+
failed_only_messages, failed_pruned, resilience_policy
2002+
)
2003+
expect(
2004+
failed_plan.get("can_resume") is False,
2005+
"recovery plan should block resume when no success anchor is available",
2006+
)
2007+
expect(
2008+
failed_plan.get("recovery_action") == "safe_fallback",
2009+
"recovery plan should provide safe fallback path for unrecoverable contexts",
2010+
)
2011+
expect(
2012+
bool(failed_plan.get("fallback", {}).get("steps")),
2013+
"safe fallback should include actionable recovery steps",
2014+
)
2015+
19582016
wizard_state_path = (
19592017
home / ".config" / "opencode" / "my_opencode-install-state.json"
19602018
)

0 commit comments

Comments
 (0)