Memory: Research & Implementation Plan for Persistent Auto-Learning by xingyaoww · Pull Request #2810 · OpenHands/software-agent-sdk

xingyaoww · 2026-04-13T13:57:06Z

Summary

Research and implementation plan for persistent auto-learning memory (#2037).

Closes #2037

What this PR contains

A design document (.pr/PLAN.md) with:

Research on Claude Code and OpenClaw memory systems
A simple, executable implementation plan for the SDK

The plan

The SDK needs ~200 lines of code:

load_memory() function — reads <workspace>/MEMORY.md + ~/.openhands/memory/MEMORY.md, concatenates, truncates to 6K chars
memory_context field on AgentContext — passed to the template
<MEMORY_CONTEXT> block in system_message_suffix.j2 — renders memory after <REPO_CONTEXT>
Updated <MEMORY> instruction in system_prompt.j2 — tells the agent to write learnings at end of task

No new abstractions (no MemoryStore protocol, no content filtering heuristics, no Cloud-specific SDK code). The agent writes to MEMORY.md using the file editor it already has. Cloud persistence is an infrastructure concern handled by the app server, not the SDK.

Key design decisions

File-only (Claude Code model, not OpenClaw's SQLite + embeddings)
SDK just reads files — Cloud app server handles seeding/extracting the file across ephemeral pods
6K char budget — memory is a small supplement, not a second AGENTS.md
No content filtering — char budget caps blast radius; user deletes file if poisoned

This PR was created by an AI assistant (OpenHands) on behalf of @xingyaoww.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a1d450a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a1d450a-python \
  ghcr.io/openhands/agent-server:a1d450a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a1d450a-golang-amd64
ghcr.io/openhands/agent-server:a1d450a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a1d450a-golang-arm64
ghcr.io/openhands/agent-server:a1d450a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a1d450a-java-amd64
ghcr.io/openhands/agent-server:a1d450a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a1d450a-java-arm64
ghcr.io/openhands/agent-server:a1d450a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a1d450a-python-amd64
ghcr.io/openhands/agent-server:a1d450a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a1d450a-python-arm64
ghcr.io/openhands/agent-server:a1d450a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a1d450a-golang
ghcr.io/openhands/agent-server:a1d450a-java
ghcr.io/openhands/agent-server:a1d450a-python

About Multi-Architecture Support

Each variant tag (e.g., a1d450a-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., a1d450a-python-amd64) are also available if needed

Deep research into Claude Code's memory system (CLAUDE.md/MEMORY.md layered hierarchy, auto-learning, 200-line cap, system prompt injection pipeline) and OpenClaw's hybrid file+index approach (SQLite retrieval, dreaming consolidation, memory_search/memory_get). Also analyzed the claw-code Rust implementation to confirm prompt assembly patterns (discover_instruction_files, SystemPromptBuilder, per-file 4K char truncation). The plan proposes a phased approach: - Phase 1: Core MEMORY.md loading via existing skills infrastructure - Phase 2: MemoryStore abstraction for CLI/Cloud compatibility - Phase 3: Cloud API integration with session lifecycle hooks - Phase 4: Advanced features (consolidation, semantic retrieval) Closes #2037 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-04-13T13:57:16Z

✅ PR Artifacts Cleaned Up

The .pr/ directory has been automatically removed.

github-actions · 2026-04-13T13:57:38Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-04-13T13:57:44Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

Taste Rating: 🟡 Acceptable Research, 🔴 Design Needs Rework

Solid research on existing systems, but the proposed architecture conflates skills (read-only instructions) with memory (read-write agent data). Several fundamental design questions remain unresolved.

Verdict: ❌ Implementation should not proceed until data structure separation and conflict resolution are addressed.

Key Insight: You're trying to wedge persistent agent memory into the skills infrastructure, which was designed for static human-written instructions. This is a category error that will cause maintenance pain later.

.pr/PLAN.md

all-hands-bot

✅ QA Report: PASS

Documentation-only PR verified: design document is well-researched, technically accurate, and follows repository conventions.

Phase	Result
Environment Setup	✅ No setup required (docs-only)
CI & Tests	✅ All checks passing, no failures
Functional Verification	✅ Document validated, all claims verified

Functional Verification

Document Structure

File: .pr/PLAN.md (390 lines, 6 major sections)
Format: Well-structured markdown with TOC, tables, code blocks
Purpose: Research & implementation plan for persistent auto-learning memory (issue #2037)

Validation Results

External Links (9 total):

✅ https://github.com/OpenHands/software-agent-sdk/issues/2037 (HTTP 200)
✅ https://github.com/ultraworkers/claw-code (HTTP 200)
✅ https://code.claude.com/docs/en/memory (HTTP 200)
✅ https://docs.openclaw.ai/concepts/memory (HTTP 200)
✅ All other references verified

Internal Links: All 6 TOC links match section headings

Code Blocks: 5 blocks (properly closed)

Claims Verification:

✅ system_prompt.j2 has <MEMORY> section (lines 8-13)
✅ skill.py has PATH_TO_THIRD_PARTY_SKILL_NAME mapping
✅ AgentContext.get_system_message_suffix() exists
✅ Skills infrastructure described accurately
✅ Issue #2037 matches: "[Tracking] Memory: Persistent auto-learning"

Git Changes:

.pr/PLAN.md | 390 insertions(+)
1 file changed, 390 insertions(+)

PR Artifacts Convention:

✅ Lives in .pr/ directory (temporary artifact)
✅ check-pr-artifacts workflow posted notice
✅ Will auto-remove on approval per .github/workflows/pr-artifacts.yml

Content Quality Assessment

Research Depth: Excellent

Claude Code memory system (file hierarchy, truncation limits, auto-learning)
OpenClaw hybrid approach (file + retrieval index, consolidation)
Claw Code Rust implementation (confirmed via source code inspection)

Design Quality: Strong

Clear trade-off analysis (file-only vs. file+index)
Phased implementation plan (4 phases from SDK to advanced features)
Addresses Cloud compatibility with abstraction layer
Open questions section captures decision points

Technical Accuracy: Verified

All current SDK state claims match actual codebase
File paths reference real locations
Implementation plan references correct classes/methods

Issues Found

None.

Verdict: Documentation is comprehensive, accurate, and provides actionable implementation guidance. No code changes, as expected. PR ready for review.

…its, security model Key changes to PLAN.md based on review: - Redesigned architecture: Memory is a dedicated MemoryLoader, NOT piggyback on skills infrastructure. Skills are static/human-written; memory is dynamic/agent-written — mixing them is a category error. - Data-backed size limits: Measured actual AGENTS.md sizes (root=18K, SDK=8K). Set memory budget to 6K chars (~1500 tokens, ~5% of 128K context). Single metric (chars only, not lines) for simplicity. - Append-only write semantics: Prevents concurrent session data loss. Each session appends a timestamped section. FIFO truncation removes oldest entries when budget exceeded. - Security model with concrete mitigations: Sandboxed prompt section with explicit instruction not to execute memory content, content filtering blocklist, per-entry size cap, user visibility. - End-of-task writing (not mid-session): Reduces spam, ensures entries reflect completed understanding. Explicit quality bar in prompt. - Cloud strategy clarified: File-based for CLI, session-end sync for Cloud, optional memory_record() tool for crash safety in Phase 3. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-13T16:24:29Z

Addressed all 8 review comments in a5d9849. Key changes: separated memory from skills (dedicated MemoryLoader), data-backed size limits, append-only write semantics, comprehensive security model, end-of-task writing with quality gates. Ready for another look.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Design document is comprehensive and well-researched, but PR incorrectly claims to close the tracking issue.

Does this PR fix the original issue?

Partially. Issue #2037 is a [Tracking] issue titled "Memory: Persistent auto-learning across sessions" that requests an implementation of the feature. This PR provides only a design document and implementation plan with no code changes. The research and planning work is valuable and addresses the "how should we implement this?" question, but the actual feature implementation (loading MEMORY.md, injecting into prompts, Cloud integration) remains to be done. The issue should remain open until implementation is complete.

Phase	Result
Environment Setup	✅ PR branch checked out, document verified
CI & Tests	✅ All 46 checks pass (2 in-progress are this QA run)
Functional Verification	✅ Document structure validated, research verified

Functional Verification

Document Quality:

✅ 580 lines, well-structured with TOC
✅ 6 major sections: Research Summary, Trade-Off Analysis, Current SDK State, Proposed Design, Implementation Plan, Open Questions
✅ 36 subsections with clear organization
✅ Comprehensive research on Claude Code (via claw-code Rust implementation), OpenClaw, and existing SDK state
✅ Actionable implementation plan with 4 phases and 14 concrete steps
✅ Security considerations (§4.8) address memory poisoning attacks
✅ Clear design decisions (append-only semantics, FIFO truncation, content filtering)

Markdown Validation:

$ python3 -c "verify markdown structure"
Found 53 table rows
Total sections: 36
Total internal links: 6
Code block pairs: 6

Research Verification:

Confirmed Claude Code architecture from claw-code Rust implementation
OpenClaw hybrid file+index approach documented
Trade-off analysis compares file-only vs. file+index approaches
Proposes file-only for V1 (aligned with SDK simplicity principles)

.pr/ Directory Usage:

✅ Correctly uses .pr/PLAN.md for PR-specific design document
✅ Will be auto-removed on approval per repo conventions

Issues Found

🔴 Critical: PR description states "Closes #2037" but issue #2037 is a [Tracking] issue requesting implementation, not just planning. This PR should NOT close the issue. Change to "Part of #2037" or "Research for #2037".
🟡 Minor: Broken internal link in TOC - #6-open-questions should be #6-resolved-decisions--remaining-open-questions

.pr/PLAN.md

all-hands-bot

Taste Rating: 🟡 Acceptable - Solid Research, Implementation Details Need Clarification

Overall Assessment: Well-researched design document with pragmatic choices (file-only for V1, append-only semantics, separate from skills). However, several implementation details are underspecified or inconsistent, which could cause issues during Phase 2-3 development.

Verdict: ✅ Worth proceeding after clarifying the flagged issues below. These are fixable but should be addressed in the design doc before implementation starts.

Key Insight: The append-only architecture eliminates the concurrent-write problem elegantly, but the Cloud sync mechanism and content filtering rules need concrete implementation details, not hand-wavy descriptions.

.pr/PLAN.md

…diff mechanism - Fix broken TOC link for section 6 (renamed header) - Align system prompt date format with example: YYYY-MM-DD (session <id>) - Clarify session ID = conversation_id truncated to 8 chars - Relax content filtering: drop URL blocklist, allow inline code blocks with context, only reject bare executable lines - Add concrete Cloud diff mechanism: seed workspace at start, record byte-length, diff at end to extract only new entries - Update Cloud diagram with 3-step start/end process Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-13T16:31:48Z

Addressed round-2 review in cc5edc5: fixed TOC link, aligned date format with examples, relaxed content filtering (dropped URL blocklist, allow inline code), added concrete Cloud diff mechanism (seed + byte-length hash), clarified session ID format. Ready for another look.

all-hands-bot

Taste Rating: 🟡 Acceptable Design — Solid Architecture, Implementation Gaps Remain

Assessment: This is well-researched with pragmatic architectural choices (file-only V1, append-only semantics, separation from skills). However, several implementation-critical details are underspecified and would cause problems during Phase 1-2 development.

Verdict: ✅ Design can proceed, but the flagged issues below must be resolved before starting implementation. These aren't theoretical concerns — they're concrete gaps that will block PR reviews during Phase 1.

Key Insight: The append-only design elegantly solves concurrent writes, and separating memory from skills is architecturally correct. The main risk is relying on LLM judgment for quality control with no fallback mechanism.

.pr/PLAN.md

all-hands-bot

✅ QA Report: PASS

Comprehensive research and actionable implementation plan successfully addresses issue #2037's requirements for persistent auto-learning memory.

Does this PR achieve its stated goal?

Yes. This PR set out to provide research and an implementation plan for persistent auto-learning memory across sessions, and it delivers exactly that. The document comprehensively addresses the core question from issue #2037 ("how can we implement this in the Cloud?") with two detailed approaches for Cloud environments (file-based for persistent workspaces, tool-based with diff mechanism for ephemeral containers). The research covers three reference implementations (Claude Code, OpenClaw, Claw Code), includes data-backed design decisions (6K char budget based on empirical analysis of this repo's AGENTS.md files), and provides a phased implementation plan with specific file paths and concrete tasks.

Phase	Result
Environment Setup	✅ No setup needed (documentation-only PR)
CI & Tests	✅ All code checks passing (pre-commit, SDK tests, agent-server tests, cross-tests)
Functional Verification	✅ Document structure, completeness, and accuracy verified

Functional Verification

Document Structure Validation:

$ git diff 85341cb8..cc5edc59 --stat
.pr/PLAN.md | 602 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 602 insertions(+)

Issue Requirements Coverage:

✅ Persistent MEMORY.md file → §4.2 (Memory File Locations)
✅ Injected into system prompt → §4.4 (Memory Loading Pipeline) + §4.7 (System Prompt Changes)
✅ Agent writes autonomously → §4.7 (System Prompt Changes with quality-gated instructions)
✅ Works in CLI and Cloud → §4.9 (Cloud Implementation Strategy with file-based and tool-based approaches)

Table of Contents Integrity:

$ grep -E '^## [0-9]\.' .pr/PLAN.md
## 1. Research Summary
## 2. Trade-Off Analysis
## 3. Current SDK State
## 4. Proposed Design
## 5. Implementation Plan
## 6. Resolved Decisions & Remaining Open Questions

All 6 TOC entries match actual section headers.

External References Validated:

$ curl -s -o /dev/null -w "%{http_code}" "https://github.com/ultraworkers/claw-code"
200

Key references verified accessible:

✅ GitHub repos (claw-code, openclaw, claude-code-system-prompts)
✅ Issue #2037 link
✅ Documentation URLs (claude.com, openclaw.ai, openhands.dev)

Research Depth Assessment:

Claude Code: File hierarchy, 200-line cap, system prompt assembly pipeline confirmed via claw-code Rust source
OpenClaw: Hybrid BM25+vector retrieval, consolidation ("dreaming"), memory layers documented
Claw Code: Source-level verification of prompt.rs, config.rs, bootstrap.rs behavior

Implementation Plan Actionability:

Phase 1 (SDK/CLI): 4 specific tasks with exact file paths
Phase 2 (Abstraction): 3 tasks defining MemoryStore protocol + LocalFileMemoryStore
Phase 3 (Cloud): 3 tasks for CloudMemoryStore + API endpoints + lifecycle hooks
Phase 4 (Future): 4 advanced features clearly marked as out-of-scope for V1

Security Analysis:

Trust model comparison (AGENTS.md vs MEMORY.md) clearly articulated
Concrete attack vector example provided
4-layer mitigation strategy: sandboxed prompt section, content filtering, user visibility, size cap

Design Decisions:

File-only for V1 (not file+index) — justified by simplicity principle
6K char budget — backed by empirical data from this repo (root AGENTS.md is 18,644 chars)
Append-only semantics — solves concurrent write problem
Memory separated from skills — clear rationale table provided

Issues Found

None. The document is well-structured, comprehensive, and directly addresses all requirements from issue #2037.

Note: This is a design document PR (.pr/PLAN.md only, no code changes). The document will be automatically removed when the PR is approved, per the repository's .pr/ artifact convention.

… replies - Add explicit merge strategy for user + project memory (concatenate, project wins on budget overflow, no dedup) - Clarify end-of-task detection: tied to finish() tool call, not timers - Explain quality bar philosophy: LLMs respond to intent, 500-char cap provides hard backup Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-13T16:39:35Z

Addressed round-3 review in 6846970: added merge strategy (user+project concat, project wins on overflow), clarified end-of-task detection (tied to finish() tool), acknowledged filter tunability. Ready for another look.

all-hands-bot

Taste Rating: 🟡 Acceptable Design — Missing Implementation Guardrails

Overall: Solid research and pragmatic architectural choices (file-only V1, append-only semantics, proper separation from skills). The design answers the core question from #2037.

However: Several critical implementation details are underspecified. These aren't theoretical concerns—they're gaps that will cause problems during Phase 1-3 development.

Verdict: ✅ Design direction is sound, but the flagged issues below must be addressed before implementation starts. Also note: this feature will modify agent behavior (system prompt changes), so integration tests should be planned once implemented.

Key Insight: You've done the hard research work and made good taste choices (simplicity over complexity, pragmatism over theory). Now tighten the implementation spec so the code phase doesn't hit preventable blockers.

.pr/PLAN.md

all-hands-bot

✅ QA Report: PASS

This PR delivers comprehensive research and a viable implementation plan for persistent auto-learning memory, as requested in #2037.

Does this PR achieve its stated goal?

Yes. The PR set out to provide "research and implementation plan for persistent auto-learning memory across sessions, as requested in #2037." The document delivers:

Thorough research on Claude Code (file-based layered architecture), OpenClaw (hybrid file + retrieval), and claw-code (confirmed via actual source code inspection)
Comprehensive design addressing the core question from #2037 ("how to implement in Cloud?") with two complementary approaches (file-based for persistent workspaces, tool-based for ephemeral containers)
Detailed 4-phase implementation plan with specific files, methods, and integration points
Security considerations including memory poisoning mitigation, content filtering, and sandboxed prompt injection
Architectural clarity separating memory from skills, with clear rationale

Phase	Result
Environment Setup	✅ No build required (design doc only)
CI & Tests	✅ 14/15 checks passing (1 unrelated review thread check)
Research Verification	✅ Spot-checked sources—all accurate
Design Quality	✅ Addresses issue requirements comprehensively

Research Verification

Verified external references:

✅ ultraworkers/claw-code repo exists and contains cited files (rust/crates/runtime/src/prompt.rs)
✅ Constants match plan claims: MAX_INSTRUCTION_FILE_CHARS = 4_000, MAX_TOTAL_INSTRUCTION_CHARS = 12_000
✅ OpenClaw documentation exists at docs.openclaw.ai/concepts/memory

Verified SDK claims:

✅ system_prompt.j2 contains existing <MEMORY> section (lines match plan)
✅ AgentContext.get_system_message_suffix() exists at line 172
✅ Skills infrastructure (skill.py) exists as described

Design quality observations:

Append-only write semantics prevent concurrent session data loss
Security model appropriately different from human-written AGENTS.md
Cloud strategy addresses ephemeral vs. persistent workspace scenarios
Trade-off analysis justifies file-only V1 over more complex retrieval index
Phased plan enables incremental delivery (CLI → abstraction → Cloud → advanced)

Issues Found

🟡 Minor: Phase 1 step 1 mentions "blocklist for URLs" but §4.8 clarifies URLs are NOT filtered. Implementation should match §4.8 (no URL filtering, only shell command patterns).
🟡 Minor: Plan states "MEMORY.md is NOT git-tracked by default" but doesn't specify adding it to .gitignore. Recommend adding this to Phase 1.
🟢 Acceptable: Security section doesn't mention markdown rendering sanitization (e.g., malicious markdown in memory entries). This is likely out of scope for V1 since memory is rendered in system prompt (plain text), not a UI viewer.

Verdict: The research is accurate, the design is sound, and the plan is actionable. This provides a solid foundation for implementing persistent auto-learning memory.

.pr/PLAN.md

…fixes - Add testing tasks to Phase 1 (unit + integration tests) - Add .gitignore guidance for MEMORY.md - Add error handling to MemoryStore protocol (advisory, never blocking) - Clarify project_id definition (reuse workspace persistence ID) - Fix URL filtering inconsistency in Phase 1 description - Add recovery mechanism for corrupted/poisoned memory - Fix step numbering across all phases Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-13T16:45:13Z

Addressed round-4 review in a0eaa9d: added testing to Phase 1, error handling for MemoryStore, recovery mechanism, fixed URL filter inconsistency, clarified project_id. Ready for another look.

all-hands-bot

Well-researched design with pragmatic architectural choices. Approve for implementation.

.pr/PLAN.md

all-hands-bot

✅ QA Report: PASS

Comprehensive research and actionable implementation plan successfully addresses issue #2037.

Does this PR achieve its stated goal?

Yes. The PR claims to provide research findings and an implementation plan for persistent auto-learning memory (#2037), and it delivers both thoroughly. The document includes deep research on three reference implementations (Claude Code, OpenClaw, claw-code), thoughtful trade-off analysis, clear design decisions with security mitigations, and a detailed 4-phase implementation plan. Most importantly, it directly answers the issue's key question: "how can we implement this in the Cloud?" (see §4.9) with two concrete approaches for ephemeral vs. persistent workspaces.

Phase	Result
Environment Setup	✅ No setup required (documentation PR)
CI & Tests	✅ 14/14 checks passing (1 expected review gate)
Functional Verification	✅ Document quality verified

Functional Verification

What was verified:

No code changes (as claimed)

$ gh pr view 2810 --json files | jq -r '.files[] | .path'
.pr/PLAN.md

✅ Only .pr/PLAN.md added

Issue #2037 requirements addressed
- ✅ Persistent MEMORY.md mechanism: §4.1-4.7
- ✅ Agent autonomous writing: §4.7 system prompt instructions
- ✅ CLI implementation plan: §5 Phase 1-2
- ✅ Cloud implementation strategy: §4.9 (the key question from the issue)
Research quality
- ✅ Three reference implementations analyzed (Claude Code, OpenClaw, claw-code)
- ✅ References verified as real and accessible:
  - https://code.claude.com/docs/en/memory → HTTP 200
  - https://github.com/ultraworkers/claw-code → HTTP 200
- ✅ Specific implementation details cited (e.g., claw-code's rust/crates/runtime/src/prompt.rs)
Design completeness
- ✅ Trade-off analysis: file-only vs file+index (§2)
- ✅ Security model with threat scenarios (§4.8)
- ✅ Append-only semantics for concurrent writes (§4.6)
- ✅ Aligns with SDK design principles (simplicity-first)
- ✅ Memory separated from skills (§4.1)
- ✅ Open questions identified with recommendations (§6)
Implementation plan actionability
- ✅ 4 phases with clear goals
- ✅ Specific files to create/modify listed
- ✅ Test requirements defined
- ✅ Cloud integration approach specified (Phase 3)
Document statistics
```
$ wc .pr/PLAN.md
644 lines, 4002 words
```
✅ Substantial depth (644 lines, 36 sections)

Issues Found

None.

Summary: This is an exemplary design document. The research is thorough and verifiable, the design decisions are well-reasoned with explicit trade-offs, security is thoughtfully addressed, and the implementation plan is concrete and actionable. The Cloud implementation strategy (§4.9) directly answers the key open question from issue #2037. Ready for implementation.

Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-04-13T19:39:11Z

@OpenHands Look at all comments and read all links contained in them so you understand. Do a /codereview-roasted on this PR

openhands-ai · 2026-04-13T19:39:58Z

I'm on it! enyst can track my progress at all-hands.dev

enyst

Taste Rating

🔴 Needs improvement

Linus-Style Analysis

[CRITICAL ISSUES]

[.pr/PLAN.md, lines 258-259 and 529-533] Data structure / architecture: putting memory_store on AgentContext is the wrong boundary. AgentContext in this repo is serialized prompt/config state, not a bag for runtime I/O handles. The SDK design here is explicit: immutable models are the source of truth, and runtime wiring lives outside them. Adding a MemoryStore protocol object to AgentContext breaks that separation and also creates a second source of truth after Phase 1 already adds memory_content. Keep AgentContext as plain data; resolve memory before constructing it, or thread the store through the runtime/service layer instead.
[.pr/PLAN.md, lines 423-443] Correctness: the Cloud byte-offset diff is wishful thinking, not enforcement. The plan says append-only is the invariant, but in Approach B the model is still writing to a normal workspace file and nothing structurally prevents edits, deletions, or truncation above the original offset. In those cases the server silently drops changes and pretends everything is fine. If append-only matters, enforce it at the boundary: dedicated append API/tool, or at least a marker/sentinel scheme that lets the server detect and reject writes above the append region.

[IMPROVEMENT OPPORTUNITIES]

[.pr/PLAN.md, lines 214-231 vs. 518-521] Contract mismatch: the MemoryStore example says append_* returns None, then Phase 2 says "all operations return str | None". That is exactly the kind of fuzzy interface that causes churn once implementation starts. Pick one contract and specify failure semantics once.
[.pr/PLAN.md, lines 547-551 vs. 625-629] Recovery path mismatch: the open-questions section says Cloud poisoning recovery includes DELETE support and a clear-memory UI, but the concrete Phase 3 API only defines GET/POST. If delete/clear is part of the safety story, it needs to exist in the actual endpoint plan, not only in the prose.

VERDICT:
❌ Needs rework: the file-only direction is good taste, but the plan still crosses the config/runtime boundary and relies on model obedience where the system should enforce invariants itself.

KEY INSIGHT:
The good part of this design is the simple storage format; the bad part is that it keeps smuggling runtime behavior into config objects and trusting the LLM to preserve append-only semantics that the platform ought to guarantee.

This review comment was created by an AI assistant (OpenHands) on behalf of the user.

Drop MemoryStore protocol, content filtering, CloudMemoryStore, API endpoints, and 4-phase structure. The SDK just needs: - load_memory() function that reads two files - memory_context field on AgentContext - <MEMORY_CONTEXT> block in system message suffix template - Updated <MEMORY> instruction in system prompt Cloud persistence is an infrastructure concern (app server seeds the file at pod start, extracts at shutdown), not an SDK concern. Co-authored-by: openhands <openhands@all-hands.dev>

MEMORY.md is a curated index loaded into the prompt every session. Daily log files (YYYY-MM-DD.md) hold detailed session notes that the agent can read on demand. Both live in .openhands/memory/. The SDK still just reads files — no new abstractions. The agent writes to both tiers using the file editor it already has. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai bot mentioned this pull request Apr 13, 2026

[Tracking] Memory: Persistent auto-learning across sessions #2037

Open