Skip to content

feat(memory): path-aware episode/skill provenance + human-gated promote#10

Merged
jkyberneees merged 3 commits into
mainfrom
feat/memory-path-aware-provenance
Jun 6, 2026
Merged

feat(memory): path-aware episode/skill provenance + human-gated promote#10
jkyberneees merged 3 commits into
mainfrom
feat/memory-path-aware-provenance

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Summary

Fixes the episode-memory dead-end: provenance tainted an episode if the session used read_file / search_files / multi_grep — by tool name alone. Those run in essentially every coding session, so almost every episode was born Untrusted, untrusted episodes are filtered out of recall, and nothing ever set UserApproved. Net result: episodes were extracted (an LLM call per session) and stored, but for normal work never recalled — the tier was dead weight.

This makes taint argument-aware and adds the missing human-gated escape hatch.

What changed

1. Path-aware taint (shared single source of truth)

New memory.ToolCallTaints(name, argsJSON) decides per call:

  • Always untrusted: browser, http_batch, transcribe, any MCP tool (server__tool).
  • Path-scoped: read_file / search_files / multi_grep taint only when their path argument resolves to a sensitive location — reusing the existing, tested danger.ClassifyPath (system_write/destructive: /etc, ~/.ssh, ~/.aws, …). Workspace and other non-sensitive local reads stay trusted. Malformed/absent path → conservative taint.

This precisely re-targets the original "covers /etc/, $HOME/.ssh" concern while letting ordinary sessions remain recallable. Because ClassifyPath resolves paths itself, DeriveProvenance keeps its (messages) signature — no cwd plumbing into the async session-end call sites.

2. Skills mirror

internal/skills/selfimprove.go now delegates to the same memory.ToolCallTaints, so learned skills stop being over-flagged for review too. Trust model stays unified.

3. Human-gated promote (CLI only)

odek memory list                    # episodes excluded from recall, with sources
odek memory promote <session_id>    # approve one after reviewing its summary

Deliberately not an agent tool — a prompt-injected agent must not be able to self-approve its own poisoned memory. Mirrors the existing skill promote pattern.

4. Doc/comment drift fixed

  • docs/CONFIG.md: extract_on_end ("durable facts" → it extracts an episode summary).
  • memory.go: FormatEpisodeContext comment ("recency-based ranker, no LLM" → it uses the configured, LLM-by-default ranker).
  • docs/SECURITY.md §5/§6 updated for the new model.

Security rationale

The control still closes the persistence-injection vector: anything that ingests genuinely external content (web/http/MCP/audio) or reads sensitive system/credential paths is still untrusted and non-recallable until a human promotes it. Only the false-positive (workspace reads) is removed.

Verification

  • go build ./..., go vet, gofmt clean.
  • go test ./internal/memory/... ./internal/skills/... ./cmd/odek/... — all pass. New coverage:
    • path-aware taint: workspace read trusted; /etc/passwd & ~/.ssh/id_rsa tainted; default-path search trusted; malformed args conservatively taint; http_batch/transcribe always taint; ToolCallTaints table.
    • episode promote/pending lifecycle + on-disk persistence; promote error cases.
    • skills mirror test doubles as a guard that the llm.Message → LlmMessage conversion carries Arguments.
    • CLI command end to end.
  • Manual binary smoke (temp HOME, seeded index): untrusted web episode listed as pending, clean workspace episode not flagged, promote makes it recallable, error cases exit 1 cleanly.

Note

Only new episodes get the refined provenance; existing on-disk tainted episodes stay as-is — odek memory promote is exactly how a user recovers those.

🤖 Generated with Claude Code

jkyberneees and others added 2 commits June 6, 2026 09:37
Fix the "episode dead-end": provenance tainted episodes by tool *name* only,
so read_file/search_files/multi_grep — used in essentially every coding
session — always produced an untrusted episode. Untrusted episodes are
excluded from recall, and nothing ever set UserApproved, so for normal work
the episode tier was written but never replayed (dead weight).

Provenance is now argument-aware via a single shared predicate
`memory.ToolCallTaints(name, argsJSON)`:

- Always untrusted: browser, http_batch, transcribe, and any MCP tool (__).
- Path-scoped (read_file, search_files, multi_grep): taint only when the
  `path` argument resolves to a sensitive location (danger.ClassifyPath →
  system_write/destructive, e.g. /etc, ~/.ssh, ~/.aws). Workspace and other
  non-sensitive local reads stay trusted. Malformed/absent path → conservative
  taint. This precisely re-targets the original "covers /etc/, $HOME/.ssh"
  concern while letting ordinary sessions stay recallable.

The skills provenance (internal/skills/selfimprove.go) now delegates to the
same predicate, so learned skills stop being over-flagged for review too.

Reusing danger.ClassifyPath means DeriveProvenance keeps its (messages)
signature — no cwd plumbing into the async session-end call sites.

Escape hatch for genuinely-external episodes: new human-gated CLI
`odek memory list` / `odek memory promote <session_id>` sets UserApproved.
It is intentionally NOT an agent tool — a prompt-injected agent must not be
able to self-approve its own poisoned memory.

Also fixes two stale descriptions found while mapping the system: the
extract_on_end docs ("durable facts" → episode summary) and the
FormatEpisodeContext comment ("recency-based ranker, no LLM" → it uses the
configured, LLM-by-default ranker).

Tests: path-aware taint (workspace trusted, sensitive/system tainted, malformed
conservative, always-external), the ToolCallTaints table, episode
promote/pending lifecycle + persistence, skills mirror (guards the
llm.Message→LlmMessage Arguments plumbing), and the CLI command end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…02/D-03/D-05)

Remediates the findings from the AI Verification Protocol run on this PR. The
v1 taint check used a sensitive-path *denylist* (danger.ClassifyPath) over only
three tools, which left several bypasses:

- D-03 (regression): macOS /etc and /var are symlinks to /private/etc, /private
  /var. ClassifyPath only matched /etc,/var, so read_file("/private/etc/...")
  classified local_write and did NOT taint — a sensitive read that tainted
  before this PR. Confirmed live.
- D-01: sibling whole-file readers (batch_read, json_query, head_tail, sort,
  tr, diff, count_lines, checksum, word_count, file_info, glob, tree, base64)
  were in no taint set at all, so reading /etc/shadow via batch_read produced a
  TRUSTED, recallable episode.
- D-02: home credential files outside ClassifyPath's hardcoded subdir list
  (e.g. ~/secrets.env) classified local_write and did not taint.
- D-05: session_search re-surfaces prior-session transcripts but never tainted.

Fix — flip the model from a sensitive-path denylist to a workspace-containment
allowlist, and broaden coverage:

- ToolCallTaints now taints a path-reading tool when ANY of its path arguments
  (path / path_a / path_b / files[].path) resolves OUTSIDE the workspace trust
  zone (cwd, the sandbox /workspace mount, or ~/.odek). Symlinks are resolved
  so /etc -> /private/etc can't disguise an escape. Workspace reads stay
  trusted (the original goal); everything else taints. Malformed args still
  taint conservatively; absent/empty path defaults to the workspace.
- PathReadingTools expanded to every file-reading tool; session_search added to
  AlwaysExternalTools. Provenance no longer depends on ClassifyPath.
- Defense in depth: ClassifyPath now strips the macOS /private prefix so
  /private/etc|var classify the same as /etc|var for its other callers
  (transcribe risk, write confinement).

Also adds the missing manager-wrapper tests (PromoteEpisode /
PendingReviewEpisodes were at 0% coverage — the report's coverage finding;
now 100%) and updates docs/SECURITY.md to describe the containment model
accurately (the report's doc-accuracy finding).

Verified: batch_read/json_query/diff of /etc, read_file of /private/etc and
~/secrets.env, and session_search all taint; workspace reads (relative + abs)
and the /workspace mount stay trusted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees
Copy link
Copy Markdown
Contributor Author

Verification Protocol — remediation pass (commit c550d2d)

All findings from the prior certificate are closed and verified.

Finding Sev Status
D-03 macOS /private/etc|var read no longer tainted (regression) HIGH ✅ Fixed — containment check resolves symlinks; ClassifyPath also strips /private. Verified read_file("/private/etc/master.passwd") → taints.
D-01 sibling whole-file readers (batch_read, json_query, head_tail, diff, …) bypass taint HIGH ✅ Fixed — PathReadingTools broadened to all 16 file-readers; multi-path args (path_a/b, files[].path) extracted. batch_read{/etc/shadow} → taints.
D-02 home creds outside the hardcoded list (~/secrets.env) not tainted MED ✅ Fixed — any read outside the workspace taints now.
D-05 session_search recalls prior transcripts untainted MED ✅ Fixed — added to AlwaysExternalTools.
Coverage PromoteEpisode/PendingReviewEpisodes at 0% LOW ✅ Fixed — now 100%.
Docs (2.9) SECURITY.md overstated coverage ✅ Rewritten to the containment model.

Design change: the taint check flipped from a sensitive-path denylist (ClassifyPath) to a workspace-containment allowlist — a path-reading call taints when any path argument resolves outside the workspace / sandbox /workspace / ~/.odek (symlink-resolved). Workspace reads stay trusted (original goal preserved); everything else taints. This closes the whole class of denylist-incompleteness bypasses rather than patching them one by one.

Re-scored axes

2.1 ✅ · 2.2 ✅ · 2.3 ✅ (was 🔴) · 2.4 ✅ · 2.5 ✅ · 2.6 ✅ · 2.7 ⚠️ (monoculture) · 2.8 ✅ (was 🔴) · 2.9 ✅

η ≈ 0.68 (signals improved: o≈0.75, b≈0.92, s≈0.90, t=1, d=1, f=1, m skipped/redistributed) · ρ = 0.21 (unchanged — single-model pipeline).

Verdict: 🟡 HumanReviewRequired — but now only on the correlation gate

No axis is 🔴 anymore; every substantive finding is fixed. The verdict stays at HumanReviewRequired solely because the pipeline is a monoculture (the author and all verifier roles are the same model/family → ρ=0.21 ∈ (0.20, 0.30], which also drags η below 0.80). Per §3.8 that is a hard cap a single model cannot clear by improving the code — a human reviewer or a genuinely independent second-provider verification is the remaining gate. That's the protocol working as designed: a model may not self-certify to AutoApprove.

Verification artifacts: deterministic tests now cover each closed bypass (TestToolCallTaints, broadened path-tool cases) and the manager wrappers.

Add `memory.auto_approve_episodes` (default false). When enabled, untrusted
episodes are stamped approved at session end so they are recalled without a
manual `odek memory promote` — an explicit opt-in that trades the human review
gate for convenience in trusted/sandboxed deployments.

- New EpisodeProvenance.AutoApproved field, kept DISTINCT from UserApproved so
  the audit trail shows the approval was automatic (policy), while Untrusted +
  Sources are still recorded. Episode Search and PendingReview treat
  AutoApproved as approved.
- The stamp is applied at write time in OnSessionEndWithProvenance only when
  the flag is on and the episode is untrusted; trusted episodes are unaffected.
- Config plumbed through MemoryConfig (default false), NewMemoryManager merge,
  and resolveMemory.
- docs/CONFIG.md and docs/SECURITY.md document it as a security trade-off that
  disables the persistence-injection protection for episodes when on.

Tests: store-level recall of an AutoApproved episode + exclusion from pending;
the secure default is false; and the full OnSessionEnd stamping path with the
flag on (AutoApproved set, recallable, not pending) vs off (excluded, pending).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit 4ffdb7d into main Jun 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant