feat: Error Classification & Recovery + OODA-Optimized Memory Prefetch Skills

## Overview

Two skill proposals to improve agent resilience and responsiveness, designed to work with **both Copilot CLI and Claude Code**.

---

## Skill 1: Error Classification & Recovery

### Problem

Agents fail in predictable ways but lack systematic recovery strategies. Failures cascade instead of self-correcting.

### Research Basis

- 39% loop recovery rate with classification + hint injection
- 85% precision in failure type detection
- Antifragility: systems that get stronger from failures

### Error Taxonomy

| Type | Detection Signal | Recovery Strategy |
|------|------------------|-------------------|
| **Tool Failure** | Non-zero exit, API error | Retry with backoff, fallback tool |
| **Reasoning Drift** | Output diverges from intent | Re-anchor with original prompt |
| **Infinite Loop** | Repeated tool calls (3+) | Break loop, summarize progress, escalate |
| **Scope Creep** | Task expands beyond original | Checkpoint, confirm with user |
| **Context Overflow** | Token limit warnings | Compress context, archive old turns |

### Implementation

#### 1. Error Observer Hook

Wrap tool execution to classify failures:

```python
# .agents/hooks/error_observer.py
def classify_error(tool_name: str, exit_code: int, stderr: str) -> ErrorType:
    if "rate limit" in stderr.lower():
        return ErrorType.TRANSIENT
    if exit_code == 2:  # Config error (ADR-035)
        return ErrorType.CONFIG
    if is_repeated_call(tool_name, count=3):
        return ErrorType.LOOP
    return ErrorType.LOGIC
```

#### 2. Recovery Hint Injection

Store failure→recovery mappings:

```yaml
# .agents/recovery-hints.yaml
tool_failures:
  gh:
    - pattern: "GraphQL: Could not resolve"
      hint: "Issue/PR number may not exist. Verify with `gh issue list`."
    - pattern: "HTTP 403"
      hint: "Rate limited. Wait 60s or use `gh api --cache`."
      
reasoning_drift:
  - signal: "Let me also add..."
    hint: "STOP. Check if this is in scope. Original task was: {original_task}"
```

#### 3. Integration Points

**Copilot CLI:**
- Hook into `copilot-cli` wrapper script
- Log errors to `.agents/sessions/errors.jsonl`
- Inject hints via system prompt modification

**Claude Code:**
- Use `CLAUDE.md` pre-tool hook pattern
- Error classification runs post-tool-execution
- Recovery hints added to next turn context

#### 4. Pattern Learning

After recovery, log what worked:

```jsonl
{"timestamp": "2026-02-26T22:00:00Z", "error_type": "LOOP", "tool": "gh", "recovery": "break_and_summarize", "success": true}
```

Graduate patterns with 3+ successful recoveries to `MEMORY.md`.

---

## Skill 2: OODA-Optimized Memory Prefetch

### Problem

Agents waste the "Orient" phase of OODA gathering context that's predictable. Research shows 88-99% cycle time reduction is possible.

### Research Basis

- OODA loop optimization: fastest cycle wins
- Edge deployment principle: precompute, don't fetch on demand
- Predictable context: git status, open PRs, recent sessions are almost always needed

### Prefetch Targets

| Context | Trigger | Cache Duration |
|---------|---------|----------------|
| Git status + branch | Session start | 5 min |
| Open PRs (assigned) | Session start | 15 min |
| Recent commits (5) | Session start | 15 min |
| CI status (last run) | Session start | 10 min |
| Open issues (assigned) | Session start | 30 min |
| Last session summary | Session start | Until next session |
| HEARTBEAT.md tasks | Heartbeat poll | 1 min |

### Implementation

#### 1. Prefetch Script

```python
# .agents/hooks/session_prefetch.py
import subprocess
import json
from pathlib import Path

CACHE_DIR = Path(".agents/cache")

def prefetch_context():
    """Run at session start, cache results."""
    context = {
        "git_branch": run("git branch --show-current"),
        "git_status": run("git status --short"),
        "open_prs": run("gh pr list --author @me --json number,title,url --limit 5"),
        "recent_commits": run("git log --oneline -5"),
        "ci_status": run("gh run list --limit 1 --json status,conclusion,name"),
    }
    
    cache_path = CACHE_DIR / "session_context.json"
    cache_path.write_text(json.dumps(context, indent=2))
    return context
```

#### 2. Context Injection

**Copilot CLI:**
- Prefetch runs in `.copilot/hooks/pre-session.sh`
- Results injected via environment variable or temp file
- CLI reads cache before first prompt

**Claude Code:**
- Prefetch runs in `CLAUDE.md` session init block
- Results added to passive context (high in prompt)
- Cache invalidation on git operations

#### 3. Smart Invalidation

```python
# Invalidate on relevant operations
INVALIDATION_TRIGGERS = {
    "git_*": ["git commit", "git push", "git pull", "git checkout"],
    "open_prs": ["gh pr create", "gh pr merge", "gh pr close"],
    "ci_status": ["gh workflow run", "git push"],
}
```

#### 4. Latency Targets

| Metric | Before | After | Target |
|--------|--------|-------|--------|
| Time to first tool call | 3-5s | <1s | 80% reduction |
| Context gathering turns | 2-3 | 0 | Eliminate |
| Redundant API calls | 5+/session | 1 (prefetch) | 80% reduction |

---

## Shared Infrastructure

Both skills share:

1. **`.agents/cache/`** - Ephemeral context cache
2. **`.agents/hooks/`** - Pre/post execution hooks
3. **`.agents/sessions/errors.jsonl`** - Error pattern log
4. **`MEMORY.md` graduation** - Patterns with 3+ occurrences

---

## Success Criteria

### Error Classification & Recovery
- [ ] Error taxonomy implemented with 5 types
- [ ] Recovery hints for top 10 failure patterns
- [ ] Loop detection breaks 80% of infinite loops
- [ ] Pattern graduation to MEMORY.md working

### OODA-Optimized Prefetch
- [ ] Session start prefetches 5 context types
- [ ] Cache invalidation on relevant git/gh ops
- [ ] Measured 50%+ reduction in "Orient" phase time
- [ ] Works in both Copilot CLI and Claude Code

---

## Open Questions

1. Should error classification run synchronously (blocking) or async (background)?
2. How to handle prefetch in offline/air-gapped environments?
3. Should recovery hints be agent-specific or shared across all agents?

---

## References

- MEMORY.md: OODA Loop Optimization (2026-02-23)
- MEMORY.md: Multi-Agent Self-Correction (2026-02-25)
- MEMORY.md: Behavioral Drift Detection (2026-02-25)
- ADR-035: Exit codes (0=success, 1=logic, 2=config, 3=external, 4=auth)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Error Classification & Recovery + OODA-Optimized Memory Prefetch Skills #1330

Overview

Skill 1: Error Classification & Recovery

Problem

Research Basis

Error Taxonomy

Implementation

1. Error Observer Hook

2. Recovery Hint Injection

3. Integration Points

4. Pattern Learning

Skill 2: OODA-Optimized Memory Prefetch

Problem

Research Basis

Prefetch Targets

Implementation

1. Prefetch Script

2. Context Injection

3. Smart Invalidation

4. Latency Targets

Shared Infrastructure

Success Criteria

Error Classification & Recovery

OODA-Optimized Prefetch

Open Questions

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Type	Detection Signal	Recovery Strategy
Tool Failure	Non-zero exit, API error	Retry with backoff, fallback tool
Reasoning Drift	Output diverges from intent	Re-anchor with original prompt
Infinite Loop	Repeated tool calls (3+)	Break loop, summarize progress, escalate
Scope Creep	Task expands beyond original	Checkpoint, confirm with user
Context Overflow	Token limit warnings	Compress context, archive old turns

Context	Trigger	Cache Duration
Git status + branch	Session start	5 min
Open PRs (assigned)	Session start	15 min
Recent commits (5)	Session start	15 min
CI status (last run)	Session start	10 min
Open issues (assigned)	Session start	30 min
Last session summary	Session start	Until next session
HEARTBEAT.md tasks	Heartbeat poll	1 min

Metric	Before	After	Target
Time to first tool call	3-5s	<1s	80% reduction
Context gathering turns	2-3	0	Eliminate
Redundant API calls	5+/session	1 (prefetch)	80% reduction

Uh oh!

feat: Error Classification & Recovery + OODA-Optimized Memory Prefetch Skills #1330

Description

Overview

Skill 1: Error Classification & Recovery

Problem

Research Basis

Error Taxonomy

Implementation

1. Error Observer Hook

2. Recovery Hint Injection

3. Integration Points

4. Pattern Learning

Skill 2: OODA-Optimized Memory Prefetch

Problem

Research Basis

Prefetch Targets

Implementation

1. Prefetch Script

2. Context Injection

3. Smart Invalidation

4. Latency Targets

Shared Infrastructure

Success Criteria

Error Classification & Recovery

OODA-Optimized Prefetch

Open Questions

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions