feat(memory): path-aware episode/skill provenance + human-gated promote by jkyberneees · Pull Request #10 · BackendStack21/odek

jkyberneees · 2026-06-06T07:37:43Z

Summary

Fixes the episode-memory dead-end: provenance tainted an episode if the session used read_file / search_files / multi_grep — by tool name alone. Those run in essentially every coding session, so almost every episode was born Untrusted, untrusted episodes are filtered out of recall, and nothing ever set UserApproved. Net result: episodes were extracted (an LLM call per session) and stored, but for normal work never recalled — the tier was dead weight.

This makes taint argument-aware and adds the missing human-gated escape hatch.

What changed

1. Path-aware taint (shared single source of truth)

New memory.ToolCallTaints(name, argsJSON) decides per call:

Always untrusted: browser, http_batch, transcribe, any MCP tool (server__tool).
Path-scoped: read_file / search_files / multi_grep taint only when their path argument resolves to a sensitive location — reusing the existing, tested danger.ClassifyPath (system_write/destructive: /etc, ~/.ssh, ~/.aws, …). Workspace and other non-sensitive local reads stay trusted. Malformed/absent path → conservative taint.

This precisely re-targets the original "covers /etc/, $HOME/.ssh" concern while letting ordinary sessions remain recallable. Because ClassifyPath resolves paths itself, DeriveProvenance keeps its (messages) signature — no cwd plumbing into the async session-end call sites.

2. Skills mirror

internal/skills/selfimprove.go now delegates to the same memory.ToolCallTaints, so learned skills stop being over-flagged for review too. Trust model stays unified.

3. Human-gated promote (CLI only)

odek memory list                    # episodes excluded from recall, with sources
odek memory promote <session_id>    # approve one after reviewing its summary

Deliberately not an agent tool — a prompt-injected agent must not be able to self-approve its own poisoned memory. Mirrors the existing skill promote pattern.

4. Doc/comment drift fixed

docs/CONFIG.md: extract_on_end ("durable facts" → it extracts an episode summary).
memory.go: FormatEpisodeContext comment ("recency-based ranker, no LLM" → it uses the configured, LLM-by-default ranker).
docs/SECURITY.md §5/§6 updated for the new model.

Security rationale

The control still closes the persistence-injection vector: anything that ingests genuinely external content (web/http/MCP/audio) or reads sensitive system/credential paths is still untrusted and non-recallable until a human promotes it. Only the false-positive (workspace reads) is removed.

Verification

go build ./..., go vet, gofmt clean.
go test ./internal/memory/... ./internal/skills/... ./cmd/odek/... — all pass. New coverage:
- path-aware taint: workspace read trusted; /etc/passwd & ~/.ssh/id_rsa tainted; default-path search trusted; malformed args conservatively taint; http_batch/transcribe always taint; ToolCallTaints table.
- episode promote/pending lifecycle + on-disk persistence; promote error cases.
- skills mirror test doubles as a guard that the llm.Message → LlmMessage conversion carries Arguments.
- CLI command end to end.
Manual binary smoke (temp HOME, seeded index): untrusted web episode listed as pending, clean workspace episode not flagged, promote makes it recallable, error cases exit 1 cleanly.

Note

Only new episodes get the refined provenance; existing on-disk tainted episodes stay as-is — odek memory promote is exactly how a user recovers those.

🤖 Generated with Claude Code

Fix the "episode dead-end": provenance tainted episodes by tool *name* only, so read_file/search_files/multi_grep — used in essentially every coding session — always produced an untrusted episode. Untrusted episodes are excluded from recall, and nothing ever set UserApproved, so for normal work the episode tier was written but never replayed (dead weight). Provenance is now argument-aware via a single shared predicate `memory.ToolCallTaints(name, argsJSON)`: - Always untrusted: browser, http_batch, transcribe, and any MCP tool (__). - Path-scoped (read_file, search_files, multi_grep): taint only when the `path` argument resolves to a sensitive location (danger.ClassifyPath → system_write/destructive, e.g. /etc, ~/.ssh, ~/.aws). Workspace and other non-sensitive local reads stay trusted. Malformed/absent path → conservative taint. This precisely re-targets the original "covers /etc/, $HOME/.ssh" concern while letting ordinary sessions stay recallable. The skills provenance (internal/skills/selfimprove.go) now delegates to the same predicate, so learned skills stop being over-flagged for review too. Reusing danger.ClassifyPath means DeriveProvenance keeps its (messages) signature — no cwd plumbing into the async session-end call sites. Escape hatch for genuinely-external episodes: new human-gated CLI `odek memory list` / `odek memory promote <session_id>` sets UserApproved. It is intentionally NOT an agent tool — a prompt-injected agent must not be able to self-approve its own poisoned memory. Also fixes two stale descriptions found while mapping the system: the extract_on_end docs ("durable facts" → episode summary) and the FormatEpisodeContext comment ("recency-based ranker, no LLM" → it uses the configured, LLM-by-default ranker). Tests: path-aware taint (workspace trusted, sensitive/system tainted, malformed conservative, always-external), the ToolCallTaints table, episode promote/pending lifecycle + persistence, skills mirror (guards the llm.Message→LlmMessage Arguments plumbing), and the CLI command end to end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…02/D-03/D-05) Remediates the findings from the AI Verification Protocol run on this PR. The v1 taint check used a sensitive-path *denylist* (danger.ClassifyPath) over only three tools, which left several bypasses: - D-03 (regression): macOS /etc and /var are symlinks to /private/etc, /private /var. ClassifyPath only matched /etc,/var, so read_file("/private/etc/...") classified local_write and did NOT taint — a sensitive read that tainted before this PR. Confirmed live. - D-01: sibling whole-file readers (batch_read, json_query, head_tail, sort, tr, diff, count_lines, checksum, word_count, file_info, glob, tree, base64) were in no taint set at all, so reading /etc/shadow via batch_read produced a TRUSTED, recallable episode. - D-02: home credential files outside ClassifyPath's hardcoded subdir list (e.g. ~/secrets.env) classified local_write and did not taint. - D-05: session_search re-surfaces prior-session transcripts but never tainted. Fix — flip the model from a sensitive-path denylist to a workspace-containment allowlist, and broaden coverage: - ToolCallTaints now taints a path-reading tool when ANY of its path arguments (path / path_a / path_b / files[].path) resolves OUTSIDE the workspace trust zone (cwd, the sandbox /workspace mount, or ~/.odek). Symlinks are resolved so /etc -> /private/etc can't disguise an escape. Workspace reads stay trusted (the original goal); everything else taints. Malformed args still taint conservatively; absent/empty path defaults to the workspace. - PathReadingTools expanded to every file-reading tool; session_search added to AlwaysExternalTools. Provenance no longer depends on ClassifyPath. - Defense in depth: ClassifyPath now strips the macOS /private prefix so /private/etc|var classify the same as /etc|var for its other callers (transcribe risk, write confinement). Also adds the missing manager-wrapper tests (PromoteEpisode / PendingReviewEpisodes were at 0% coverage — the report's coverage finding; now 100%) and updates docs/SECURITY.md to describe the containment model accurately (the report's doc-accuracy finding). Verified: batch_read/json_query/diff of /etc, read_file of /private/etc and ~/secrets.env, and session_search all taint; workspace reads (relative + abs) and the /workspace mount stay trusted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jkyberneees · 2026-06-06T07:58:05Z

Verification Protocol — remediation pass (commit `c550d2d`)

All findings from the prior certificate are closed and verified.

Finding	Sev	Status
D-03 macOS `/private/etc\|var` read no longer tainted (regression)	HIGH	✅ Fixed — containment check resolves symlinks; `ClassifyPath` also strips `/private`. Verified `read_file("/private/etc/master.passwd")` → taints.
D-01 sibling whole-file readers (`batch_read`, `json_query`, `head_tail`, `diff`, …) bypass taint	HIGH	✅ Fixed — `PathReadingTools` broadened to all 16 file-readers; multi-path args (`path_a/b`, `files[].path`) extracted. `batch_read{/etc/shadow}` → taints.
D-02 home creds outside the hardcoded list (`~/secrets.env`) not tainted	MED	✅ Fixed — any read outside the workspace taints now.
D-05 `session_search` recalls prior transcripts untainted	MED	✅ Fixed — added to `AlwaysExternalTools`.
Coverage `PromoteEpisode`/`PendingReviewEpisodes` at 0%	LOW	✅ Fixed — now 100%.
Docs (2.9) `SECURITY.md` overstated coverage	—	✅ Rewritten to the containment model.

Design change: the taint check flipped from a sensitive-path denylist (ClassifyPath) to a workspace-containment allowlist — a path-reading call taints when any path argument resolves outside the workspace / sandbox /workspace / ~/.odek (symlink-resolved). Workspace reads stay trusted (original goal preserved); everything else taints. This closes the whole class of denylist-incompleteness bypasses rather than patching them one by one.

Re-scored axes

2.1 ✅ · 2.2 ✅ · 2.3 ✅ (was 🔴) · 2.4 ✅ · 2.5 ✅ · 2.6 ✅ · 2.7 ⚠️ (monoculture) · 2.8 ✅ (was 🔴) · 2.9 ✅

η ≈ 0.68 (signals improved: o≈0.75, b≈0.92, s≈0.90, t=1, d=1, f=1, m skipped/redistributed) · ρ = 0.21 (unchanged — single-model pipeline).

Verdict: 🟡 HumanReviewRequired — but now only on the correlation gate

No axis is 🔴 anymore; every substantive finding is fixed. The verdict stays at HumanReviewRequired solely because the pipeline is a monoculture (the author and all verifier roles are the same model/family → ρ=0.21 ∈ (0.20, 0.30], which also drags η below 0.80). Per §3.8 that is a hard cap a single model cannot clear by improving the code — a human reviewer or a genuinely independent second-provider verification is the remaining gate. That's the protocol working as designed: a model may not self-certify to AutoApprove.

Verification artifacts: deterministic tests now cover each closed bypass (TestToolCallTaints, broadened path-tool cases) and the manager wrappers.

Add `memory.auto_approve_episodes` (default false). When enabled, untrusted episodes are stamped approved at session end so they are recalled without a manual `odek memory promote` — an explicit opt-in that trades the human review gate for convenience in trusted/sandboxed deployments. - New EpisodeProvenance.AutoApproved field, kept DISTINCT from UserApproved so the audit trail shows the approval was automatic (policy), while Untrusted + Sources are still recorded. Episode Search and PendingReview treat AutoApproved as approved. - The stamp is applied at write time in OnSessionEndWithProvenance only when the flag is on and the episode is untrusted; trusted episodes are unaffected. - Config plumbed through MemoryConfig (default false), NewMemoryManager merge, and resolveMemory. - docs/CONFIG.md and docs/SECURITY.md document it as a security trade-off that disables the persistence-injection protection for episodes when on. Tests: store-level recall of an AutoApproved episode + exclusion from pending; the secure default is false; and the full OnSessionEnd stamping path with the flag on (AutoApproved set, recallable, not pending) vs off (excluded, pending). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jkyberneees and others added 2 commits June 6, 2026 09:37

jkyberneees merged commit 4ffdb7d into main Jun 6, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): path-aware episode/skill provenance + human-gated promote#10

feat(memory): path-aware episode/skill provenance + human-gated promote#10
jkyberneees merged 3 commits into
mainfrom
feat/memory-path-aware-provenance

jkyberneees commented Jun 6, 2026

Uh oh!

jkyberneees commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 6, 2026

Summary

What changed

1. Path-aware taint (shared single source of truth)

2. Skills mirror

3. Human-gated promote (CLI only)

4. Doc/comment drift fixed

Security rationale

Verification

Note

Uh oh!

jkyberneees commented Jun 6, 2026

Verification Protocol — remediation pass (commit c550d2d)

Re-scored axes

Verdict: 🟡 HumanReviewRequired — but now only on the correlation gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Verification Protocol — remediation pass (commit `c550d2d`)