ContextAssembler — Dynamic Per-Turn Context Retrieval #867
Replies: 7 comments
-
|
just a short update, after several sessions: it seems to work well in two aspects: the deduplication does the job, and I have the feeling that sessions benefit from better memory in their context. Especially cross-project seems helpful. |
Beta Was this translation helpful? Give feedback.
-
Update: ContextAssembler Dedup and Prompt Caching AlignmentAfter studying how Claude Code's prompt caching works internally (based on Abhishek Ray's deep dive and Thariq Shihipar's architectural notes), I realized the ContextAssembler's dedup mechanism has an additional benefit beyond saving context space. How caching interacts with per-turn context injectionClaude Code caches the entire prompt prefix — system prompt, tool definitions, CLAUDE.md, and all prior conversation messages. The cache breakpoint slides forward each turn via auto-caching, so turn N's user message (including any hook-injected This means:
Practical implication for anyone building similar hooksIf you're injecting context via |
Beta Was this translation helpful? Give feedback.
-
Update: Semantic Search — Normalization Bug Found & FixedAfter running semantic search (Phase 2, nomic-embed-text + LanceDB) for about a week, I noticed the system was injecting irrelevant files on every turn. Even a simple operational prompt like "check if my servers are reachable" would pull in unrelated relationship notes, empty template files, and random session summaries. Ran an investigation with a 4-agent Council debate and some hands-on cosine similarity testing. Found two compounding problems. 1. Units mismatch in normalization constantsThe original empirical test (35 queries, all rated STRONG discrimination) measured cosine distance values (0.28–0.44 range). But the normalization constants were set as if those were the score range: const SEMANTIC_MIN = 0.30; // ← distance values
const SEMANTIC_MAX = 0.70;The ContextAssembler converts LanceDB results to similarity ( Fixed by measuring the actual similarity distribution across the corpus: const SEMANTIC_MIN = 0.52; // corpus floor
const SEMANTIC_MAX = 0.82; // best specific match ceiling2. Index pollution — 22% boilerplate174 out of 785 indexed files were empty THREAD.md scaffolds (just The indexer had a content gate of only 20 chars. Raised it to 100 chars of stripped content (after removing frontmatter, headers, and placeholder text). Index dropped from 785 → 613 files. Zero false positives on real content. Results
The system works as intended now. Specific queries surface exactly the right files. Vague conversational follow-ups don't pull in noise. Lesson learnedWhen calibrating normalization ranges for embedding models, always verify which metric your test harness reports (distance vs similarity) and which metric your consumer uses. In my case the test was correct, the constants were just copied without the conversion. A classic units bug. Updated codeThe ContextAssembler gist needs updating with these fixes — I'll push the corrected files there. |
Beta Was this translation helpful? Give feedback.
-
Update: Compaction Detection — Flag File Replaces CLAUDE.md CountingFound and fixed a significant bug in how the ContextAssembler detects compaction events. The BugThe original approach counted Result: every turn was treated as post-compaction, resetting the dedup manifest and re-injecting ~4K tokens of already-present context. This is the "context blowup" I mentioned earlier — it was this bug, not the semantic search. The FixInstead of inferring compaction from CLAUDE.md load counts, use a flag file written by the
This is more reliable than any heuristic counting. PostCompactRecovery only fires on actual compaction — no false positives from multi-CLAUDE.md setups. ImpactBefore fix: ~4K tokens of irrelevant context injected every turn in multi-project sessions. The ContextAssembler gist has been updated with the fixed hook. |
Beta Was this translation helpful? Give feedback.
-
Update: Observability — Two-Line Status DisplaySmall but useful quality-of-life improvement. The original status line was opaque — you could see how many files were injected this turn, but couldn't tell what was happening across the session. The ProblemThe old format: This tells you what happened this turn, but not:
The FixTwo-line format with cumulative totals: Line 1: session-wide picture. Line 2: what happened this turn. All-dedup turn (nothing new to inject): When semantic search is down: Structure stays identical across all turns — only numbers change. Abbreviations keep it compact: What the numbers mean
The Gist updated with both files. |
Beta Was this translation helpful? Give feedback.
-
Update: Sub-File Chunking — Section-Level Precision in Semantic SearchAfter running the normalization fix and compaction detection improvements for a few days, I evaluated MemSearch (Zilliz's new open-source memory system for AI agents, extracted from the MCP ecosystem). Not a replacement for what we've built here, but it surfaced one idea that turned out to be the biggest improvement yet: sub-file chunking. The ProblemThe semantic index was embedding whole files — one vector per file. A 400-line PRD with sections like The FixSplit markdown files at heading boundaries ( A few details that matter:
LanceDB schema gained three columns: Results
Search results now show heading-level labels: The 500-Character FlawHere's the part that stung. While implementing chunking, I asked a simple question about what actually gets injected into the AI's context. The answer: The system could now identify that This line had been in the code since day one. My AI assistant had modified the surrounding code multiple times over three weeks. I never reviewed that specific line. Neither of us questioned it because The fix was obvious once we saw it: when a specific section matched, extract and inject that section's content. Token counts dropped dramatically. One file went from 2,445 tokens down to 197. Budget now fits 18 candidates instead of 12. The question that found it wasn't "is this function correct?" It was "what actually gets shown to the AI, and is it the right thing?" A question about purpose, not implementation. That's the kind of question humans need to keep asking when working with AI on systems — the code-level view was fine, the system-level view was broken. Updated CodeThe ContextAssembler gist has been updated with both files:
Credit to MemSearch for the chunking idea. Their approach (dense + BM25 hybrid via RRF) is interesting but heavier than what we need here. |
Beta Was this translation helpful? Give feedback.
-
|
This is really cool jlacour. I don't have anything meaningful to contribute here (you're way more technical than me), other than affirmation of the the problem you are trying to fix. I have noticed this same context problem in my own use of PAI and I've been keeping up with your discussion thread here as part of my research on how to fix it. You are very brilliant. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Related: #690 (closed — addressed startup context size in v4.0), Discussion #731 (context window optimization)
The Problem
v4.0's lean bootstrap is a huge improvement. Startup context dropped from ~38% to ~19%. But there's a second problem that the bootstrap reduction doesn't address: sessions start context-blind about what you've been working on.
Every new session begins the same way — the AI has no idea what you did yesterday, what failed last week, or which project you're continuing. You either re-explain context manually, or the AI discovers it the hard way (reading files, asking questions, wasting turns).
The MEMORY system captures all of this. Work directories, failure analyses, project CONCEPT.md files, memory files. But nothing surfaces it automatically at the right moment.
What We Built
A ContextAssembler — a UserPromptSubmit hook that reads your prompt, finds relevant prior context, and injects it as
additionalContexton every turn. No semantic search, no vector DB, no network calls. Pure filesystem, under 3 seconds.Architecture
Two files:
ContextAssembler.hook.ts, ~200 lines) — fires on UserPromptSubmit, calls the assembler, handles per-session dedup, outputs the injection payloadContextAssembler.ts, ~690 lines) — the actual logic. Gathers candidates from multiple sources, scores them, assembles within a token budgetHow It Works
Candidate sources:
STATE/current-work.json→ active PRD)WORK/directories, configurable recency window — default 7 days)LEARNING/FAILURES/— keyword match)*/CONCEPT.mdby keyword match)Scoring (per candidate):
All scoring weights are configurable via
settings.json.Budget: Configurable token budget per injection source — default 5K for hook injection (lightweight), 15K for direct calls (e.g., Algorithm OBSERVE). Both adjustable in
settings.json.Dedup: Per-session manifest (
context-seen-{session_id}.json) tracks which files have been injected. On subsequent turns, only new or modified files are injected. This prevents the ~4K/turn accumulation that would cause mid-stream compactions.Configuration
All settings live under
settings.json → contextAssembler:{ "contextAssembler": { "projectsDirectory": "~/Projects", "hookBudget": 5000, "directBudget": 15000, "recencyDays": 7, "maxRecentWork": 10, "typeWeights": { "current-work": 1.0, "recent-work": 0.8, "memory-file": 0.7, "project-concept": 0.6, "archived-prd": 0.5, "failure": 0.4 } } }Every value has a sensible default. Works with zero configuration.
What It Looks Like in Practice
On each turn, the AI receives a block like:
Field Results
Running in production for multiple days across dozens of sessions:
Remaining Portability Considerations
The implementation is fully configurable (paths, budgets, weights). One area that could use community input:
Proposal
What could be SYSTEM-tier (upstream):
settings.json → contextAssemblerWhat should stay USER-tier:
Phase 2 direction (not for initial contribution):
Questions for the Community
Happy to share the full implementation if there's interest. The two files are self-contained and only depend on
hooks/lib/paths.ts.Beta Was this translation helpful? Give feedback.
All reactions