-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Summary
CodeFRAME has semantic memory (project facts) and episodic memory (task history), but lacks procedural memory—"how I learned to do things." When agents discover effective patterns, that learning should persist.
Background: Memory Types
From Philipp Schmid's "Memory in Agents":
Semantic Memory ("What"): Retaining specific facts, concepts, and structured knowledge about users, e.g. user prefers Python over JavaScript.
Episodic Memory ("When" and "Where"): Recall past events or specific experiences to accomplish tasks by looking at past interactions.
Procedural Memory ("How"): Internalized rules and instructions on how an agent performs tasks, e.g. "My summaries are too long" if multiple users provide feedback to be shorter.
CodeFRAME likely has:
- ✅ Semantic: Project architecture, tech stack, conventions (in CLAUDE.md, context tiers)
- ✅ Episodic: Task history, blocker resolutions, test results
- ❓ Procedural: ???
What Procedural Memory Looks Like
Examples of procedural learning:
| Discovery During Task | Procedural Memory Entry |
|---|---|
| "This test kept failing until I mocked the database" | "When testing DB-dependent code in this project, always mock the connection" |
| "The API rate limits at 100 req/s" | "Add exponential backoff when calling external API X" |
| "TypeScript strict mode catches errors early" | "Run tsc --strict before committing TypeScript changes" |
| "Human preferred shorter summaries in blockers" | "Keep blocker descriptions under 3 sentences" |
This isn't just "what happened" (episodic)—it's "how to do things better" extracted from experience.
Why This Matters
Without procedural memory:
- Agents repeat the same mistakes across sessions
- Successful patterns aren't codified
- Human guidance (blocker resolutions) is forgotten
- Each task starts from zero knowledge of "how this codebase works"
With procedural memory:
- Agent gets better over time
- Patterns that work are reinforced
- Human teaching persists
- Compound improvement across sessions
Implementation Approaches
Option A: Explicit Procedure Extraction
After each task or blocker resolution, extract procedural learning:
def extract_procedure(task_result, blocker_resolutions):
prompt = f"""
Task completed: {task_result.summary}
Challenges encountered: {task_result.challenges}
Solutions that worked: {task_result.solutions}
Human guidance received: {blocker_resolutions}
Extract 0-3 procedural rules that should guide future similar tasks.
Format: IF [condition] THEN [action] BECAUSE [reason]
Only extract genuinely reusable patterns, not task-specific details.
"""
return llm.extract(prompt)Option B: Pattern Detection from History
Periodically analyze episodic memory for patterns:
def detect_patterns(episodic_memory):
# Find repeated failure → success patterns
# Identify common blocker types and their resolutions
# Extract recurring human guidance themesOption C: Human-Labeled Procedures
When humans resolve blockers, option to mark as "remember this":
Blocker: "How should I handle API authentication?"
Resolution: "Use the refresh token pattern in auth_utils.py"
☑️ Remember this for future similar situations
Option D: Procedure Library Integration
Integrate with external procedure library (like Claude Code's CLAUDE.md pattern):
- Procedures stored in version-controlled file
- Agents can read and propose additions
- Human reviews procedure additions
Storage and Retrieval
procedural_memory/
├── project_procedures.md # Project-specific learned procedures
├── codebase_patterns.json # Detected code patterns
└── human_guidance_log.json # Extracted from blocker resolutions
# Retrieval: Include relevant procedures in context based on task type
def get_procedures_for_task(task):
task_type = classify_task(task) # "testing", "api_integration", "frontend", etc.
return procedure_store.query(task_type)
Success Criteria
- Defined procedural memory schema
- Implemented extraction mechanism (Option A, B, or C)
- Procedures included in relevant task contexts
- Measured: fewer repeated mistakes across sessions
- Measured: faster task completion for similar task types
Metrics
- Procedure extraction rate: Procedures learned per N tasks
- Procedure utilization: % of tasks that receive relevant procedural context
- Repeat mistake rate: Same failure pattern occurring across sessions (should decrease)
- Human re-guidance rate: Humans answering similar blockers repeatedly (should decrease)
Integration with Existing Systems
- Blocker system: Rich source of human guidance → procedural extraction
- Quality gates: Failures could trigger procedure review ("did we have a procedure for this?")
- Tiered memory: Procedures could be a distinct tier (always HOT when relevant)
- Checkpoints: Procedure library should be part of checkpoint snapshots
References
- Memory in Agents - Philipp Schmid
- Learning from experience patterns in reinforcement learning
- CLAUDE.md as a form of human-curated procedural memory
- Mem0, Letta frameworks for memory implementation patterns