Skip to content

[Phase 4] Implement Procedural Memory for Agent Learning #73

@frankbria

Description

@frankbria

Summary

CodeFRAME has semantic memory (project facts) and episodic memory (task history), but lacks procedural memory—"how I learned to do things." When agents discover effective patterns, that learning should persist.

Background: Memory Types

From Philipp Schmid's "Memory in Agents":

Semantic Memory ("What"): Retaining specific facts, concepts, and structured knowledge about users, e.g. user prefers Python over JavaScript.

Episodic Memory ("When" and "Where"): Recall past events or specific experiences to accomplish tasks by looking at past interactions.

Procedural Memory ("How"): Internalized rules and instructions on how an agent performs tasks, e.g. "My summaries are too long" if multiple users provide feedback to be shorter.

CodeFRAME likely has:

  • Semantic: Project architecture, tech stack, conventions (in CLAUDE.md, context tiers)
  • Episodic: Task history, blocker resolutions, test results
  • Procedural: ???

What Procedural Memory Looks Like

Examples of procedural learning:

Discovery During Task Procedural Memory Entry
"This test kept failing until I mocked the database" "When testing DB-dependent code in this project, always mock the connection"
"The API rate limits at 100 req/s" "Add exponential backoff when calling external API X"
"TypeScript strict mode catches errors early" "Run tsc --strict before committing TypeScript changes"
"Human preferred shorter summaries in blockers" "Keep blocker descriptions under 3 sentences"

This isn't just "what happened" (episodic)—it's "how to do things better" extracted from experience.

Why This Matters

Without procedural memory:

  • Agents repeat the same mistakes across sessions
  • Successful patterns aren't codified
  • Human guidance (blocker resolutions) is forgotten
  • Each task starts from zero knowledge of "how this codebase works"

With procedural memory:

  • Agent gets better over time
  • Patterns that work are reinforced
  • Human teaching persists
  • Compound improvement across sessions

Implementation Approaches

Option A: Explicit Procedure Extraction

After each task or blocker resolution, extract procedural learning:

def extract_procedure(task_result, blocker_resolutions):
    prompt = f"""
    Task completed: {task_result.summary}
    Challenges encountered: {task_result.challenges}
    Solutions that worked: {task_result.solutions}
    Human guidance received: {blocker_resolutions}
    
    Extract 0-3 procedural rules that should guide future similar tasks.
    Format: IF [condition] THEN [action] BECAUSE [reason]
    Only extract genuinely reusable patterns, not task-specific details.
    """
    return llm.extract(prompt)

Option B: Pattern Detection from History

Periodically analyze episodic memory for patterns:

def detect_patterns(episodic_memory):
    # Find repeated failure → success patterns
    # Identify common blocker types and their resolutions
    # Extract recurring human guidance themes

Option C: Human-Labeled Procedures

When humans resolve blockers, option to mark as "remember this":

Blocker: "How should I handle API authentication?"
Resolution: "Use the refresh token pattern in auth_utils.py"
☑️ Remember this for future similar situations

Option D: Procedure Library Integration

Integrate with external procedure library (like Claude Code's CLAUDE.md pattern):

  • Procedures stored in version-controlled file
  • Agents can read and propose additions
  • Human reviews procedure additions

Storage and Retrieval

procedural_memory/
├── project_procedures.md      # Project-specific learned procedures
├── codebase_patterns.json     # Detected code patterns
└── human_guidance_log.json    # Extracted from blocker resolutions

# Retrieval: Include relevant procedures in context based on task type
def get_procedures_for_task(task):
    task_type = classify_task(task)  # "testing", "api_integration", "frontend", etc.
    return procedure_store.query(task_type)

Success Criteria

  • Defined procedural memory schema
  • Implemented extraction mechanism (Option A, B, or C)
  • Procedures included in relevant task contexts
  • Measured: fewer repeated mistakes across sessions
  • Measured: faster task completion for similar task types

Metrics

  • Procedure extraction rate: Procedures learned per N tasks
  • Procedure utilization: % of tasks that receive relevant procedural context
  • Repeat mistake rate: Same failure pattern occurring across sessions (should decrease)
  • Human re-guidance rate: Humans answering similar blockers repeatedly (should decrease)

Integration with Existing Systems

  • Blocker system: Rich source of human guidance → procedural extraction
  • Quality gates: Failures could trigger procedure review ("did we have a procedure for this?")
  • Tiered memory: Procedures could be a distinct tier (always HOT when relevant)
  • Checkpoints: Procedure library should be part of checkpoint snapshots

References

  • Memory in Agents - Philipp Schmid
  • Learning from experience patterns in reinforcement learning
  • CLAUDE.md as a form of human-curated procedural memory
  • Mem0, Letta frameworks for memory implementation patterns

Metadata

Metadata

Assignees

Labels

FutureDeferred - beyond v1/v2 scope, consider for future versionsarchitectureSystem architecture and design patternscontext-engineeringContext window management and optimizationenhancementNew feature or requestpriority:low

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions