[Phase 4] Implement Procedural Memory for Agent Learning

## Summary

CodeFRAME has semantic memory (project facts) and episodic memory (task history), but lacks procedural memory—"how I learned to do things." When agents discover effective patterns, that learning should persist.

## Background: Memory Types

From Philipp Schmid's "Memory in Agents":

> **Semantic Memory ("What"):** Retaining specific facts, concepts, and structured knowledge about users, e.g. user prefers Python over JavaScript.

> **Episodic Memory ("When" and "Where"):** Recall past events or specific experiences to accomplish tasks by looking at past interactions.

> **Procedural Memory ("How"):** Internalized rules and instructions on how an agent performs tasks, e.g. "My summaries are too long" if multiple users provide feedback to be shorter.

CodeFRAME likely has:
- ✅ **Semantic**: Project architecture, tech stack, conventions (in CLAUDE.md, context tiers)
- ✅ **Episodic**: Task history, blocker resolutions, test results
- ❓ **Procedural**: ???

## What Procedural Memory Looks Like

Examples of procedural learning:

| Discovery During Task | Procedural Memory Entry |
|-----------------------|-------------------------|
| "This test kept failing until I mocked the database" | "When testing DB-dependent code in this project, always mock the connection" |
| "The API rate limits at 100 req/s" | "Add exponential backoff when calling external API X" |
| "TypeScript strict mode catches errors early" | "Run tsc --strict before committing TypeScript changes" |
| "Human preferred shorter summaries in blockers" | "Keep blocker descriptions under 3 sentences" |

This isn't just "what happened" (episodic)—it's "how to do things better" extracted from experience.

## Why This Matters

Without procedural memory:
- Agents repeat the same mistakes across sessions
- Successful patterns aren't codified
- Human guidance (blocker resolutions) is forgotten
- Each task starts from zero knowledge of "how this codebase works"

With procedural memory:
- Agent gets better over time
- Patterns that work are reinforced
- Human teaching persists
- Compound improvement across sessions

## Implementation Approaches

### Option A: Explicit Procedure Extraction

After each task or blocker resolution, extract procedural learning:
```python
def extract_procedure(task_result, blocker_resolutions):
    prompt = f"""
    Task completed: {task_result.summary}
    Challenges encountered: {task_result.challenges}
    Solutions that worked: {task_result.solutions}
    Human guidance received: {blocker_resolutions}
    
    Extract 0-3 procedural rules that should guide future similar tasks.
    Format: IF [condition] THEN [action] BECAUSE [reason]
    Only extract genuinely reusable patterns, not task-specific details.
    """
    return llm.extract(prompt)
```

### Option B: Pattern Detection from History

Periodically analyze episodic memory for patterns:
```python
def detect_patterns(episodic_memory):
    # Find repeated failure → success patterns
    # Identify common blocker types and their resolutions
    # Extract recurring human guidance themes
```

### Option C: Human-Labeled Procedures

When humans resolve blockers, option to mark as "remember this":
```
Blocker: "How should I handle API authentication?"
Resolution: "Use the refresh token pattern in auth_utils.py"
☑️ Remember this for future similar situations
```

### Option D: Procedure Library Integration

Integrate with external procedure library (like Claude Code's CLAUDE.md pattern):
- Procedures stored in version-controlled file
- Agents can read and propose additions
- Human reviews procedure additions

## Storage and Retrieval

```
procedural_memory/
├── project_procedures.md      # Project-specific learned procedures
├── codebase_patterns.json     # Detected code patterns
└── human_guidance_log.json    # Extracted from blocker resolutions

# Retrieval: Include relevant procedures in context based on task type
def get_procedures_for_task(task):
    task_type = classify_task(task)  # "testing", "api_integration", "frontend", etc.
    return procedure_store.query(task_type)
```

## Success Criteria

- [ ] Defined procedural memory schema
- [ ] Implemented extraction mechanism (Option A, B, or C)
- [ ] Procedures included in relevant task contexts
- [ ] Measured: fewer repeated mistakes across sessions
- [ ] Measured: faster task completion for similar task types

## Metrics

- **Procedure extraction rate**: Procedures learned per N tasks
- **Procedure utilization**: % of tasks that receive relevant procedural context
- **Repeat mistake rate**: Same failure pattern occurring across sessions (should decrease)
- **Human re-guidance rate**: Humans answering similar blockers repeatedly (should decrease)

## Integration with Existing Systems

- **Blocker system**: Rich source of human guidance → procedural extraction
- **Quality gates**: Failures could trigger procedure review ("did we have a procedure for this?")
- **Tiered memory**: Procedures could be a distinct tier (always HOT when relevant)
- **Checkpoints**: Procedure library should be part of checkpoint snapshots

## References

- [Memory in Agents - Philipp Schmid](https://www.philschmid.de/memory-in-agents)
- Learning from experience patterns in reinforcement learning
- CLAUDE.md as a form of human-curated procedural memory
- Mem0, Letta frameworks for memory implementation patterns


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 4] Implement Procedural Memory for Agent Learning #73

Summary

Background: Memory Types

What Procedural Memory Looks Like

Why This Matters

Implementation Approaches

Option A: Explicit Procedure Extraction

Option B: Pattern Detection from History

Option C: Human-Labeled Procedures

Option D: Procedure Library Integration

Storage and Retrieval

Success Criteria

Metrics

Integration with Existing Systems

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discovery During Task	Procedural Memory Entry
"This test kept failing until I mocked the database"	"When testing DB-dependent code in this project, always mock the connection"
"The API rate limits at 100 req/s"	"Add exponential backoff when calling external API X"
"TypeScript strict mode catches errors early"	"Run tsc --strict before committing TypeScript changes"
"Human preferred shorter summaries in blockers"	"Keep blocker descriptions under 3 sentences"

[Phase 4] Implement Procedural Memory for Agent Learning #73

Description

Summary

Background: Memory Types

What Procedural Memory Looks Like

Why This Matters

Implementation Approaches

Option A: Explicit Procedure Extraction

Option B: Pattern Detection from History

Option C: Human-Labeled Procedures

Option D: Procedure Library Integration

Storage and Retrieval

Success Criteria

Metrics

Integration with Existing Systems

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions