Skip to content

Memory: Research & Implementation Plan for Persistent Auto-Learning#2810

Draft
xingyaoww wants to merge 9 commits intomainfrom
openhands/memory-persistent-learning-plan
Draft

Memory: Research & Implementation Plan for Persistent Auto-Learning#2810
xingyaoww wants to merge 9 commits intomainfrom
openhands/memory-persistent-learning-plan

Conversation

@xingyaoww
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww commented Apr 13, 2026

Summary

Research and implementation plan for persistent auto-learning memory (#2037).

Closes #2037

What this PR contains

A design document (.pr/PLAN.md) with:

  1. Research on Claude Code and OpenClaw memory systems
  2. A simple, executable implementation plan for the SDK

The plan

The SDK needs ~200 lines of code:

  • load_memory() function — reads <workspace>/MEMORY.md + ~/.openhands/memory/MEMORY.md, concatenates, truncates to 6K chars
  • memory_context field on AgentContext — passed to the template
  • <MEMORY_CONTEXT> block in system_message_suffix.j2 — renders memory after <REPO_CONTEXT>
  • Updated <MEMORY> instruction in system_prompt.j2 — tells the agent to write learnings at end of task

No new abstractions (no MemoryStore protocol, no content filtering heuristics, no Cloud-specific SDK code). The agent writes to MEMORY.md using the file editor it already has. Cloud persistence is an infrastructure concern handled by the app server, not the SDK.

Key design decisions

  • File-only (Claude Code model, not OpenClaw's SQLite + embeddings)
  • SDK just reads files — Cloud app server handles seeding/extracting the file across ephemeral pods
  • 6K char budget — memory is a small supplement, not a second AGENTS.md
  • No content filtering — char budget caps blast radius; user deletes file if poisoned

This PR was created by an AI assistant (OpenHands) on behalf of @xingyaoww.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a1d450a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a1d450a-python \
  ghcr.io/openhands/agent-server:a1d450a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a1d450a-golang-amd64
ghcr.io/openhands/agent-server:a1d450a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a1d450a-golang-arm64
ghcr.io/openhands/agent-server:a1d450a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a1d450a-java-amd64
ghcr.io/openhands/agent-server:a1d450a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a1d450a-java-arm64
ghcr.io/openhands/agent-server:a1d450a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a1d450a-python-amd64
ghcr.io/openhands/agent-server:a1d450a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a1d450a-python-arm64
ghcr.io/openhands/agent-server:a1d450a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a1d450a-golang
ghcr.io/openhands/agent-server:a1d450a-java
ghcr.io/openhands/agent-server:a1d450a-python

About Multi-Architecture Support

  • Each variant tag (e.g., a1d450a-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a1d450a-python-amd64) are also available if needed

Deep research into Claude Code's memory system (CLAUDE.md/MEMORY.md layered
hierarchy, auto-learning, 200-line cap, system prompt injection pipeline)
and OpenClaw's hybrid file+index approach (SQLite retrieval, dreaming
consolidation, memory_search/memory_get).

Also analyzed the claw-code Rust implementation to confirm prompt assembly
patterns (discover_instruction_files, SystemPromptBuilder, per-file 4K
char truncation).

The plan proposes a phased approach:
- Phase 1: Core MEMORY.md loading via existing skills infrastructure
- Phase 2: MemoryStore abstraction for CLI/Cloud compatibility
- Phase 3: Cloud API integration with session lifecycle hooks
- Phase 4: Advanced features (consolidation, semantic retrieval)

Closes #2037

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

PR Artifacts Cleaned Up

The .pr/ directory has been automatically removed.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable Research, 🔴 Design Needs Rework

Solid research on existing systems, but the proposed architecture conflates skills (read-only instructions) with memory (read-write agent data). Several fundamental design questions remain unresolved.

Verdict: ❌ Implementation should not proceed until data structure separation and conflict resolution are addressed.

Key Insight: You're trying to wedge persistent agent memory into the skills infrastructure, which was designed for static human-written instructions. This is a category error that will cause maintenance pain later.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Documentation-only PR verified: design document is well-researched, technically accurate, and follows repository conventions.

Phase Result
Environment Setup ✅ No setup required (docs-only)
CI & Tests ✅ All checks passing, no failures
Functional Verification ✅ Document validated, all claims verified
Functional Verification

Document Structure

  • File: .pr/PLAN.md (390 lines, 6 major sections)
  • Format: Well-structured markdown with TOC, tables, code blocks
  • Purpose: Research & implementation plan for persistent auto-learning memory (issue #2037)

Validation Results

External Links (9 total):

✅ https://github.com/OpenHands/software-agent-sdk/issues/2037 (HTTP 200)
✅ https://github.com/ultraworkers/claw-code (HTTP 200)
✅ https://code.claude.com/docs/en/memory (HTTP 200)
✅ https://docs.openclaw.ai/concepts/memory (HTTP 200)
✅ All other references verified

Internal Links: All 6 TOC links match section headings

Code Blocks: 5 blocks (properly closed)

Claims Verification:

✅ system_prompt.j2 has <MEMORY> section (lines 8-13)
✅ skill.py has PATH_TO_THIRD_PARTY_SKILL_NAME mapping
✅ AgentContext.get_system_message_suffix() exists
✅ Skills infrastructure described accurately
✅ Issue #2037 matches: "[Tracking] Memory: Persistent auto-learning"

Git Changes:

.pr/PLAN.md | 390 insertions(+)
1 file changed, 390 insertions(+)

PR Artifacts Convention:

  • ✅ Lives in .pr/ directory (temporary artifact)
  • check-pr-artifacts workflow posted notice
  • ✅ Will auto-remove on approval per .github/workflows/pr-artifacts.yml

Content Quality Assessment

Research Depth: Excellent

  • Claude Code memory system (file hierarchy, truncation limits, auto-learning)
  • OpenClaw hybrid approach (file + retrieval index, consolidation)
  • Claw Code Rust implementation (confirmed via source code inspection)

Design Quality: Strong

  • Clear trade-off analysis (file-only vs. file+index)
  • Phased implementation plan (4 phases from SDK to advanced features)
  • Addresses Cloud compatibility with abstraction layer
  • Open questions section captures decision points

Technical Accuracy: Verified

  • All current SDK state claims match actual codebase
  • File paths reference real locations
  • Implementation plan references correct classes/methods

Issues Found

None.


Verdict: Documentation is comprehensive, accurate, and provides actionable implementation guidance. No code changes, as expected. PR ready for review.

…its, security model

Key changes to PLAN.md based on review:

- Redesigned architecture: Memory is a dedicated MemoryLoader, NOT
  piggyback on skills infrastructure. Skills are static/human-written;
  memory is dynamic/agent-written — mixing them is a category error.

- Data-backed size limits: Measured actual AGENTS.md sizes (root=18K,
  SDK=8K). Set memory budget to 6K chars (~1500 tokens, ~5% of 128K
  context). Single metric (chars only, not lines) for simplicity.

- Append-only write semantics: Prevents concurrent session data loss.
  Each session appends a timestamped section. FIFO truncation removes
  oldest entries when budget exceeded.

- Security model with concrete mitigations: Sandboxed prompt section
  with explicit instruction not to execute memory content, content
  filtering blocklist, per-entry size cap, user visibility.

- End-of-task writing (not mid-session): Reduces spam, ensures entries
  reflect completed understanding. Explicit quality bar in prompt.

- Cloud strategy clarified: File-based for CLI, session-end sync for
  Cloud, optional memory_record() tool for crash safety in Phase 3.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator Author

Addressed all 8 review comments in a5d9849. Key changes: separated memory from skills (dedicated MemoryLoader), data-backed size limits, append-only write semantics, comprehensive security model, end-of-task writing with quality gates. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 13, 2026 16:24
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Design document is comprehensive and well-researched, but PR incorrectly claims to close the tracking issue.

Does this PR fix the original issue?

Partially. Issue #2037 is a [Tracking] issue titled "Memory: Persistent auto-learning across sessions" that requests an implementation of the feature. This PR provides only a design document and implementation plan with no code changes. The research and planning work is valuable and addresses the "how should we implement this?" question, but the actual feature implementation (loading MEMORY.md, injecting into prompts, Cloud integration) remains to be done. The issue should remain open until implementation is complete.

Phase Result
Environment Setup ✅ PR branch checked out, document verified
CI & Tests ✅ All 46 checks pass (2 in-progress are this QA run)
Functional Verification ✅ Document structure validated, research verified
Functional Verification

Document Quality:

  • ✅ 580 lines, well-structured with TOC
  • ✅ 6 major sections: Research Summary, Trade-Off Analysis, Current SDK State, Proposed Design, Implementation Plan, Open Questions
  • ✅ 36 subsections with clear organization
  • ✅ Comprehensive research on Claude Code (via claw-code Rust implementation), OpenClaw, and existing SDK state
  • ✅ Actionable implementation plan with 4 phases and 14 concrete steps
  • ✅ Security considerations (§4.8) address memory poisoning attacks
  • ✅ Clear design decisions (append-only semantics, FIFO truncation, content filtering)

Markdown Validation:

$ python3 -c "verify markdown structure"
Found 53 table rows
Total sections: 36
Total internal links: 6
Code block pairs: 6

Research Verification:

  • Confirmed Claude Code architecture from claw-code Rust implementation
  • OpenClaw hybrid file+index approach documented
  • Trade-off analysis compares file-only vs. file+index approaches
  • Proposes file-only for V1 (aligned with SDK simplicity principles)

.pr/ Directory Usage:

  • ✅ Correctly uses .pr/PLAN.md for PR-specific design document
  • ✅ Will be auto-removed on approval per repo conventions

Issues Found

  • 🔴 Critical: PR description states "Closes #2037" but issue #2037 is a [Tracking] issue requesting implementation, not just planning. This PR should NOT close the issue. Change to "Part of #2037" or "Research for #2037".
  • 🟡 Minor: Broken internal link in TOC - #6-open-questions should be #6-resolved-decisions--remaining-open-questions

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solid Research, Implementation Details Need Clarification

Overall Assessment: Well-researched design document with pragmatic choices (file-only for V1, append-only semantics, separate from skills). However, several implementation details are underspecified or inconsistent, which could cause issues during Phase 2-3 development.

Verdict: ✅ Worth proceeding after clarifying the flagged issues below. These are fixable but should be addressed in the design doc before implementation starts.

Key Insight: The append-only architecture eliminates the concurrent-write problem elegantly, but the Cloud sync mechanism and content filtering rules need concrete implementation details, not hand-wavy descriptions.

…diff mechanism

- Fix broken TOC link for section 6 (renamed header)
- Align system prompt date format with example: YYYY-MM-DD (session <id>)
- Clarify session ID = conversation_id truncated to 8 chars
- Relax content filtering: drop URL blocklist, allow inline code blocks
  with context, only reject bare executable lines
- Add concrete Cloud diff mechanism: seed workspace at start, record
  byte-length, diff at end to extract only new entries
- Update Cloud diagram with 3-step start/end process

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator Author

Addressed round-2 review in cc5edc5: fixed TOC link, aligned date format with examples, relaxed content filtering (dropped URL blocklist, allow inline code), added concrete Cloud diff mechanism (seed + byte-length hash), clarified session ID format. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 13, 2026 16:31
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable Design — Solid Architecture, Implementation Gaps Remain

Assessment: This is well-researched with pragmatic architectural choices (file-only V1, append-only semantics, separation from skills). However, several implementation-critical details are underspecified and would cause problems during Phase 1-2 development.

Verdict: ✅ Design can proceed, but the flagged issues below must be resolved before starting implementation. These aren't theoretical concerns — they're concrete gaps that will block PR reviews during Phase 1.

Key Insight: The append-only design elegantly solves concurrent writes, and separating memory from skills is architecturally correct. The main risk is relying on LLM judgment for quality control with no fallback mechanism.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Comprehensive research and actionable implementation plan successfully addresses issue #2037's requirements for persistent auto-learning memory.

Does this PR achieve its stated goal?

Yes. This PR set out to provide research and an implementation plan for persistent auto-learning memory across sessions, and it delivers exactly that. The document comprehensively addresses the core question from issue #2037 ("how can we implement this in the Cloud?") with two detailed approaches for Cloud environments (file-based for persistent workspaces, tool-based with diff mechanism for ephemeral containers). The research covers three reference implementations (Claude Code, OpenClaw, Claw Code), includes data-backed design decisions (6K char budget based on empirical analysis of this repo's AGENTS.md files), and provides a phased implementation plan with specific file paths and concrete tasks.

Phase Result
Environment Setup ✅ No setup needed (documentation-only PR)
CI & Tests ✅ All code checks passing (pre-commit, SDK tests, agent-server tests, cross-tests)
Functional Verification ✅ Document structure, completeness, and accuracy verified
Functional Verification

Document Structure Validation:

$ git diff 85341cb8..cc5edc59 --stat
.pr/PLAN.md | 602 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 602 insertions(+)

Issue Requirements Coverage:

  • ✅ Persistent MEMORY.md file → §4.2 (Memory File Locations)
  • ✅ Injected into system prompt → §4.4 (Memory Loading Pipeline) + §4.7 (System Prompt Changes)
  • ✅ Agent writes autonomously → §4.7 (System Prompt Changes with quality-gated instructions)
  • ✅ Works in CLI and Cloud → §4.9 (Cloud Implementation Strategy with file-based and tool-based approaches)

Table of Contents Integrity:

$ grep -E '^## [0-9]\.' .pr/PLAN.md
## 1. Research Summary
## 2. Trade-Off Analysis
## 3. Current SDK State
## 4. Proposed Design
## 5. Implementation Plan
## 6. Resolved Decisions & Remaining Open Questions

All 6 TOC entries match actual section headers.

External References Validated:

$ curl -s -o /dev/null -w "%{http_code}" "https://github.com/ultraworkers/claw-code"
200

Key references verified accessible:

  • ✅ GitHub repos (claw-code, openclaw, claude-code-system-prompts)
  • ✅ Issue #2037 link
  • ✅ Documentation URLs (claude.com, openclaw.ai, openhands.dev)

Research Depth Assessment:

  • Claude Code: File hierarchy, 200-line cap, system prompt assembly pipeline confirmed via claw-code Rust source
  • OpenClaw: Hybrid BM25+vector retrieval, consolidation ("dreaming"), memory layers documented
  • Claw Code: Source-level verification of prompt.rs, config.rs, bootstrap.rs behavior

Implementation Plan Actionability:

  • Phase 1 (SDK/CLI): 4 specific tasks with exact file paths
  • Phase 2 (Abstraction): 3 tasks defining MemoryStore protocol + LocalFileMemoryStore
  • Phase 3 (Cloud): 3 tasks for CloudMemoryStore + API endpoints + lifecycle hooks
  • Phase 4 (Future): 4 advanced features clearly marked as out-of-scope for V1

Security Analysis:

  • Trust model comparison (AGENTS.md vs MEMORY.md) clearly articulated
  • Concrete attack vector example provided
  • 4-layer mitigation strategy: sandboxed prompt section, content filtering, user visibility, size cap

Design Decisions:

  • File-only for V1 (not file+index) — justified by simplicity principle
  • 6K char budget — backed by empirical data from this repo (root AGENTS.md is 18,644 chars)
  • Append-only semantics — solves concurrent write problem
  • Memory separated from skills — clear rationale table provided

Issues Found

None. The document is well-structured, comprehensive, and directly addresses all requirements from issue #2037.


Note: This is a design document PR (.pr/PLAN.md only, no code changes). The document will be automatically removed when the PR is approved, per the repository's .pr/ artifact convention.

… replies

- Add explicit merge strategy for user + project memory (concatenate,
  project wins on budget overflow, no dedup)
- Clarify end-of-task detection: tied to finish() tool call, not timers
- Explain quality bar philosophy: LLMs respond to intent, 500-char cap
  provides hard backup

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator Author

Addressed round-3 review in 6846970: added merge strategy (user+project concat, project wins on overflow), clarified end-of-task detection (tied to finish() tool), acknowledged filter tunability. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 13, 2026 16:39
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable Design — Missing Implementation Guardrails

Overall: Solid research and pragmatic architectural choices (file-only V1, append-only semantics, proper separation from skills). The design answers the core question from #2037.

However: Several critical implementation details are underspecified. These aren't theoretical concerns—they're gaps that will cause problems during Phase 1-3 development.

Verdict: ✅ Design direction is sound, but the flagged issues below must be addressed before implementation starts. Also note: this feature will modify agent behavior (system prompt changes), so integration tests should be planned once implemented.

Key Insight: You've done the hard research work and made good taste choices (simplicity over complexity, pragmatism over theory). Now tighten the implementation spec so the code phase doesn't hit preventable blockers.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

This PR delivers comprehensive research and a viable implementation plan for persistent auto-learning memory, as requested in #2037.

Does this PR achieve its stated goal?

Yes. The PR set out to provide "research and implementation plan for persistent auto-learning memory across sessions, as requested in #2037." The document delivers:

  1. Thorough research on Claude Code (file-based layered architecture), OpenClaw (hybrid file + retrieval), and claw-code (confirmed via actual source code inspection)
  2. Comprehensive design addressing the core question from #2037 ("how to implement in Cloud?") with two complementary approaches (file-based for persistent workspaces, tool-based for ephemeral containers)
  3. Detailed 4-phase implementation plan with specific files, methods, and integration points
  4. Security considerations including memory poisoning mitigation, content filtering, and sandboxed prompt injection
  5. Architectural clarity separating memory from skills, with clear rationale
Phase Result
Environment Setup ✅ No build required (design doc only)
CI & Tests ✅ 14/15 checks passing (1 unrelated review thread check)
Research Verification ✅ Spot-checked sources—all accurate
Design Quality ✅ Addresses issue requirements comprehensively
Research Verification

Verified external references:

  • ultraworkers/claw-code repo exists and contains cited files (rust/crates/runtime/src/prompt.rs)
  • ✅ Constants match plan claims: MAX_INSTRUCTION_FILE_CHARS = 4_000, MAX_TOTAL_INSTRUCTION_CHARS = 12_000
  • ✅ OpenClaw documentation exists at docs.openclaw.ai/concepts/memory

Verified SDK claims:

  • system_prompt.j2 contains existing <MEMORY> section (lines match plan)
  • AgentContext.get_system_message_suffix() exists at line 172
  • ✅ Skills infrastructure (skill.py) exists as described

Design quality observations:

  • Append-only write semantics prevent concurrent session data loss
  • Security model appropriately different from human-written AGENTS.md
  • Cloud strategy addresses ephemeral vs. persistent workspace scenarios
  • Trade-off analysis justifies file-only V1 over more complex retrieval index
  • Phased plan enables incremental delivery (CLI → abstraction → Cloud → advanced)

Issues Found

  • 🟡 Minor: Phase 1 step 1 mentions "blocklist for URLs" but §4.8 clarifies URLs are NOT filtered. Implementation should match §4.8 (no URL filtering, only shell command patterns).
  • 🟡 Minor: Plan states "MEMORY.md is NOT git-tracked by default" but doesn't specify adding it to .gitignore. Recommend adding this to Phase 1.
  • 🟢 Acceptable: Security section doesn't mention markdown rendering sanitization (e.g., malicious markdown in memory entries). This is likely out of scope for V1 since memory is rendered in system prompt (plain text), not a UI viewer.

Verdict: The research is accurate, the design is sound, and the plan is actionable. This provides a solid foundation for implementing persistent auto-learning memory.

…fixes

- Add testing tasks to Phase 1 (unit + integration tests)
- Add .gitignore guidance for MEMORY.md
- Add error handling to MemoryStore protocol (advisory, never blocking)
- Clarify project_id definition (reuse workspace persistence ID)
- Fix URL filtering inconsistency in Phase 1 description
- Add recovery mechanism for corrupted/poisoned memory
- Fix step numbering across all phases

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator Author

Addressed round-4 review in a0eaa9d: added testing to Phase 1, error handling for MemoryStore, recovery mechanism, fixed URL filter inconsistency, clarified project_id. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 13, 2026 16:45
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-researched design with pragmatic architectural choices. Approve for implementation.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Comprehensive research and actionable implementation plan successfully addresses issue #2037.

Does this PR achieve its stated goal?

Yes. The PR claims to provide research findings and an implementation plan for persistent auto-learning memory (#2037), and it delivers both thoroughly. The document includes deep research on three reference implementations (Claude Code, OpenClaw, claw-code), thoughtful trade-off analysis, clear design decisions with security mitigations, and a detailed 4-phase implementation plan. Most importantly, it directly answers the issue's key question: "how can we implement this in the Cloud?" (see §4.9) with two concrete approaches for ephemeral vs. persistent workspaces.

Phase Result
Environment Setup ✅ No setup required (documentation PR)
CI & Tests ✅ 14/14 checks passing (1 expected review gate)
Functional Verification ✅ Document quality verified
Functional Verification

What was verified:

  1. No code changes (as claimed)

    $ gh pr view 2810 --json files | jq -r '.files[] | .path'
    .pr/PLAN.md

    ✅ Only .pr/PLAN.md added

  2. Issue #2037 requirements addressed

    • ✅ Persistent MEMORY.md mechanism: §4.1-4.7
    • ✅ Agent autonomous writing: §4.7 system prompt instructions
    • ✅ CLI implementation plan: §5 Phase 1-2
    • Cloud implementation strategy: §4.9 (the key question from the issue)
  3. Research quality

    • ✅ Three reference implementations analyzed (Claude Code, OpenClaw, claw-code)
    • ✅ References verified as real and accessible:
      • https://code.claude.com/docs/en/memory → HTTP 200
      • https://github.com/ultraworkers/claw-code → HTTP 200
    • ✅ Specific implementation details cited (e.g., claw-code's rust/crates/runtime/src/prompt.rs)
  4. Design completeness

    • ✅ Trade-off analysis: file-only vs file+index (§2)
    • ✅ Security model with threat scenarios (§4.8)
    • ✅ Append-only semantics for concurrent writes (§4.6)
    • ✅ Aligns with SDK design principles (simplicity-first)
    • ✅ Memory separated from skills (§4.1)
    • ✅ Open questions identified with recommendations (§6)
  5. Implementation plan actionability

    • ✅ 4 phases with clear goals
    • ✅ Specific files to create/modify listed
    • ✅ Test requirements defined
    • ✅ Cloud integration approach specified (Phase 3)
  6. Document statistics

    $ wc .pr/PLAN.md
    644 lines, 4002 words

    ✅ Substantial depth (644 lines, 36 sections)

Issues Found

None.


Summary: This is an exemplary design document. The research is thorough and verifiable, the design decisions are well-reasoned with explicit trade-offs, security is thoughtfully addressed, and the implementation plan is concrete and actionable. The Cloud implementation strategy (§4.9) directly answers the key open question from issue #2037. Ready for implementation.

allhands-bot and others added 2 commits April 13, 2026 16:48
@xingyaoww xingyaoww marked this pull request as draft April 13, 2026 16:53
@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Apr 13, 2026

@OpenHands Look at all comments and read all links contained in them so you understand. Do a /codereview-roasted on this PR

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 13, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating

🔴 Needs improvement

Linus-Style Analysis

[CRITICAL ISSUES]

  • [.pr/PLAN.md, lines 258-259 and 529-533] Data structure / architecture: putting memory_store on AgentContext is the wrong boundary. AgentContext in this repo is serialized prompt/config state, not a bag for runtime I/O handles. The SDK design here is explicit: immutable models are the source of truth, and runtime wiring lives outside them. Adding a MemoryStore protocol object to AgentContext breaks that separation and also creates a second source of truth after Phase 1 already adds memory_content. Keep AgentContext as plain data; resolve memory before constructing it, or thread the store through the runtime/service layer instead.
  • [.pr/PLAN.md, lines 423-443] Correctness: the Cloud byte-offset diff is wishful thinking, not enforcement. The plan says append-only is the invariant, but in Approach B the model is still writing to a normal workspace file and nothing structurally prevents edits, deletions, or truncation above the original offset. In those cases the server silently drops changes and pretends everything is fine. If append-only matters, enforce it at the boundary: dedicated append API/tool, or at least a marker/sentinel scheme that lets the server detect and reject writes above the append region.

[IMPROVEMENT OPPORTUNITIES]

  • [.pr/PLAN.md, lines 214-231 vs. 518-521] Contract mismatch: the MemoryStore example says append_* returns None, then Phase 2 says "all operations return str | None". That is exactly the kind of fuzzy interface that causes churn once implementation starts. Pick one contract and specify failure semantics once.
  • [.pr/PLAN.md, lines 547-551 vs. 625-629] Recovery path mismatch: the open-questions section says Cloud poisoning recovery includes DELETE support and a clear-memory UI, but the concrete Phase 3 API only defines GET/POST. If delete/clear is part of the safety story, it needs to exist in the actual endpoint plan, not only in the prose.

VERDICT:
Needs rework: the file-only direction is good taste, but the plan still crosses the config/runtime boundary and relies on model obedience where the system should enforce invariants itself.

KEY INSIGHT:
The good part of this design is the simple storage format; the bad part is that it keeps smuggling runtime behavior into config objects and trusting the LLM to preserve append-only semantics that the platform ought to guarantee.

This review comment was created by an AI assistant (OpenHands) on behalf of the user.

@openhands-ai

This comment was marked as duplicate.

Drop MemoryStore protocol, content filtering, CloudMemoryStore, API
endpoints, and 4-phase structure. The SDK just needs:
- load_memory() function that reads two files
- memory_context field on AgentContext
- <MEMORY_CONTEXT> block in system message suffix template
- Updated <MEMORY> instruction in system prompt

Cloud persistence is an infrastructure concern (app server seeds the
file at pod start, extracts at shutdown), not an SDK concern.

Co-authored-by: openhands <openhands@all-hands.dev>
MEMORY.md is a curated index loaded into the prompt every session.
Daily log files (YYYY-MM-DD.md) hold detailed session notes that the
agent can read on demand. Both live in .openhands/memory/.

The SDK still just reads files — no new abstractions. The agent writes
to both tiers using the file editor it already has.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Tracking] Memory: Persistent auto-learning across sessions

4 participants