ContextAssembler — Dynamic Per-Turn Context Retrieval #867

jlacour-git · 2026-03-02T13:03:21Z

jlacour-git
Mar 2, 2026

Related: #690 (closed — addressed startup context size in v4.0), Discussion #731 (context window optimization)

The Problem

v4.0's lean bootstrap is a huge improvement. Startup context dropped from ~38% to ~19%. But there's a second problem that the bootstrap reduction doesn't address: sessions start context-blind about what you've been working on.

Every new session begins the same way — the AI has no idea what you did yesterday, what failed last week, or which project you're continuing. You either re-explain context manually, or the AI discovers it the hard way (reading files, asking questions, wasting turns).

The MEMORY system captures all of this. Work directories, failure analyses, project CONCEPT.md files, memory files. But nothing surfaces it automatically at the right moment.

What We Built

A ContextAssembler — a UserPromptSubmit hook that reads your prompt, finds relevant prior context, and injects it as additionalContext on every turn. No semantic search, no vector DB, no network calls. Pure filesystem, under 3 seconds.

Architecture

Two files:

Hook (ContextAssembler.hook.ts, ~200 lines) — fires on UserPromptSubmit, calls the assembler, handles per-session dedup, outputs the injection payload
Tool (ContextAssembler.ts, ~690 lines) — the actual logic. Gathers candidates from multiple sources, scores them, assembles within a token budget

How It Works

Candidate sources:

Current work (STATE/current-work.json → active PRD)
Recent work (WORK/ directories, configurable recency window — default 7 days)
Archived PRDs (keyword match on filename and content)
Failure analyses (LEARNING/FAILURES/ — keyword match)
Project context (configurable project directory, scans */CONCEPT.md by keyword match)
Memory files (Claude Code's project memory directory, auto-derived from CWD — keyword match)

Scoring (per candidate):

Recency (40%): exponential decay over the recency window
Relevance (30%): keyword match density against the prompt
Type weight (30%): current-work > recent-work > memory > project > archived > failure

All scoring weights are configurable via settings.json.

Budget: Configurable token budget per injection source — default 5K for hook injection (lightweight), 15K for direct calls (e.g., Algorithm OBSERVE). Both adjustable in settings.json.

Dedup: Per-session manifest (context-seen-{session_id}.json) tracks which files have been injected. On subsequent turns, only new or modified files are injected. This prevents the ~4K/turn accumulation that would cause mid-stream compactions.

Configuration

All settings live under settings.json → contextAssembler:

{
  "contextAssembler": {
    "projectsDirectory": "~/Projects",
    "hookBudget": 5000,
    "directBudget": 15000,
    "recencyDays": 7,
    "maxRecentWork": 10,
    "typeWeights": {
      "current-work": 1.0,
      "recent-work": 0.8,
      "memory-file": 0.7,
      "project-concept": 0.6,
      "archived-prd": 0.5,
      "failure": 0.4
    }
  }
}

Every value has a sensible default. Works with zero configuration.

What It Looks Like in Practice

On each turn, the AI receives a block like:

## Retrieved Context (ContextAssembler — Phase 1)

> Context: 5 new files (3 work, 1 failures, 1 memory), 2,847/5,000 tokens (3 already loaded, skipped)

### Work Context
**Current work: statusline v4.0 fixes** (312 tokens)
[PRD summary...]

### Relevant Failures
**Failure: confident-diagnosis-contradicted-by-evidence** (890 tokens)
[Failure context...]

### Memory Files
**Memory: github-issues** (1,234 tokens)
[Issue tracking summary...]

Field Results

Running in production for multiple days across dozens of sessions:

Consistent delivery. Context injection fires reliably on every turn.
Dedup works. After the first turn loads 8-12 files, subsequent turns typically inject 0-2 new files. No accumulation problem.
Budget respected. Never exceeded the configured hook budget.
Relevant results. When working on a specific project, it surfaces that project's context and related prior sessions. When debugging infrastructure, it surfaces related failures and prior work.

Remaining Portability Considerations

The implementation is fully configurable (paths, budgets, weights). One area that could use community input:

Keyword extraction — currently uses English stopwords. Works fine for mixed-language prompts in practice, but a proper multilingual approach would be better for non-English-primary users.

Proposal

What could be SYSTEM-tier (upstream):

The hook framework (UserPromptSubmit → assemble → inject)
Candidate gathering from WORK/, LEARNING/FAILURES/, STATE/
Scoring and budget assembly logic
Per-session dedup mechanism
Configuration via settings.json → contextAssembler

What should stay USER-tier:

Custom candidate sources beyond the defaults
Scoring weight overrides
Budget tuning

Phase 2 direction (not for initial contribution):

Semantic search via local embeddings (Ollama + LanceDB or similar)
Cross-session relevance learning (which context actually helped?)
Configurable candidate source plugins

Questions for the Community

Does this belong as a SYSTEM hook, or is it too opinionated for the default install?
Should the candidate sources be plugin-based from the start, or is a fixed set with configuration sufficient?
What's the right default token budget? 5K feels right for per-turn injection, but others may have different context pressure.

Happy to share the full implementation if there's interest. The two files are self-contained and only depend on hooks/lib/paths.ts.

jlacour-git · 2026-03-03T08:27:15Z

jlacour-git
Mar 3, 2026
Author

just a short update, after several sessions: it seems to work well in two aspects: the deduplication does the job, and I have the feeling that sessions benefit from better memory in their context. Especially cross-project seems helpful.

0 replies

jlacour-git · 2026-03-11T18:56:48Z

jlacour-git
Mar 11, 2026
Author

Update: ContextAssembler Dedup and Prompt Caching Alignment

After studying how Claude Code's prompt caching works internally (based on Abhishek Ray's deep dive and Thariq Shihipar's architectural notes), I realized the ContextAssembler's dedup mechanism has an additional benefit beyond saving context space.

How caching interacts with per-turn context injection

Claude Code caches the entire prompt prefix — system prompt, tool definitions, CLAUDE.md, and all prior conversation messages. The cache breakpoint slides forward each turn via auto-caching, so turn N's user message (including any hook-injected additionalContext) becomes part of the cached prefix on turn N+1.

This means:

ContextAssembler output from prior turns is cached for free. Content injected on turn 3 sits in the cached prefix on turns 4, 5, 6, etc. — read at 10% of normal input cost.
Without dedup, you'd pay full price twice for the same content. If the same file were re-injected on turn 4, it would be in both the cached prefix (turn 3's message) and the new message (turn 4) — redundant tokens at full input price.
Dedup turns caching into a feature, not just a cost saver. The session-level manifest (context-seen-{session_id}.json) ensures each file is injected once, then lives in the cached prefix for the rest of the session. This is the optimal pattern for prompt caching: inject once, cache forever.

Practical implication for anyone building similar hooks

If you're injecting context via UserPromptSubmit hooks, a dedup/manifest approach isn't just about context window management — it directly improves cache hit rates and reduces per-turn cost. Without dedup, every re-injection is fresh tokens at full price even though the same content already exists in the cached conversation history.

0 replies

jlacour-git · 2026-03-12T18:07:48Z

jlacour-git
Mar 12, 2026
Author

Update: Semantic Search — Normalization Bug Found & Fixed

After running semantic search (Phase 2, nomic-embed-text + LanceDB) for about a week, I noticed the system was injecting irrelevant files on every turn. Even a simple operational prompt like "check if my servers are reachable" would pull in unrelated relationship notes, empty template files, and random session summaries.

Ran an investigation with a 4-agent Council debate and some hands-on cosine similarity testing. Found two compounding problems.

1. Units mismatch in normalization constants

The original empirical test (35 queries, all rated STRONG discrimination) measured cosine distance values (0.28–0.44 range). But the normalization constants were set as if those were the score range:

const SEMANTIC_MIN = 0.30; // ← distance values
const SEMANTIC_MAX = 0.70;

The ContextAssembler converts LanceDB results to similarity (1 - distance), so actual values are in the 0.52–0.82 range. The normalization compressed everything into a useless 0.55–1.00 band. Nothing got filtered.

Fixed by measuring the actual similarity distribution across the corpus:

const SEMANTIC_MIN = 0.52; // corpus floor
const SEMANTIC_MAX = 0.82; // best specific match ceiling

2. Index pollution — 22% boilerplate

174 out of 785 indexed files were empty THREAD.md scaffolds (just _Pending..._ placeholder text). These act as "similarity attractors" — their generic markdown tokens match vaguely against everything.

The indexer had a content gate of only 20 chars. Raised it to 100 chars of stripped content (after removing frontmatter, headers, and placeholder text). Index dropped from 785 → 613 files. Zero false positives on real content.

Results

Metric	Before	After
Vague query semantic scores	0.50–0.75 (irrelevant passes)	0.00–0.09 (suppressed)
Specific query top match	S=0.62	S=0.96
Discrimination gap	0.10	0.93
Empty templates in index	174 (22%)	0

The system works as intended now. Specific queries surface exactly the right files. Vague conversational follow-ups don't pull in noise.

Lesson learned

When calibrating normalization ranges for embedding models, always verify which metric your test harness reports (distance vs similarity) and which metric your consumer uses. In my case the test was correct, the constants were just copied without the conversion. A classic units bug.

Updated code

The ContextAssembler gist needs updating with these fixes — I'll push the corrected files there.

0 replies

jlacour-git · 2026-03-16T09:04:38Z

jlacour-git
Mar 16, 2026
Author

Update: Compaction Detection — Flag File Replaces CLAUDE.md Counting

Found and fixed a significant bug in how the ContextAssembler detects compaction events.

The Bug

The original approach counted is_claude_md: true entries in instructions-loaded.jsonl. If count > 1, it assumed compaction had occurred (since CLAUDE.md gets reloaded after compaction). The problem: sessions with a project-level CLAUDE.md (e.g., ~/AI-Projects/home-it/CLAUDE.md) start with count=2 because both the global and project CLAUDE.md files are loaded at session start.

Result: every turn was treated as post-compaction, resetting the dedup manifest and re-injecting ~4K tokens of already-present context. This is the "context blowup" I mentioned earlier — it was this bug, not the semantic search.

The Fix

Instead of inferring compaction from CLAUDE.md load counts, use a flag file written by the PostCompactRecovery hook (which already fires reliably on actual compaction events):

PostCompactRecovery.hook.ts writes compacted-{session_id}.flag to MEMORY/STATE/
ContextAssembler.hook.ts checks for the flag at manifest load time
If present: reset manifest, delete flag (one-shot consumption)
If absent: normal dedup behavior

This is more reliable than any heuristic counting. PostCompactRecovery only fires on actual compaction — no false positives from multi-CLAUDE.md setups.

Impact

Before fix: ~4K tokens of irrelevant context injected every turn in multi-project sessions.
After fix: dedup works correctly, context only re-injected after real compaction.

The ContextAssembler gist has been updated with the fixed hook.

0 replies

jlacour-git · 2026-03-16T10:48:16Z

jlacour-git
Mar 16, 2026
Author

Update: Observability — Two-Line Status Display

Small but useful quality-of-life improvement. The original status line was opaque — you could see how many files were injected this turn, but couldn't tell what was happening across the session.

The Problem

The old format:

📦 Context assembled: 11 new files (3 memory, 3 work, 5 summaries), 4,856/5,000 tokens [2 dedup-skipped]

This tells you what happened this turn, but not:

How many files are tracked across all turns (cumulative total)
What types make up the full session context
How many candidates were below the relevance threshold
Whether semantic search or keyword fallback was active

The Fix

Two-line format with cumulative totals:

📦 Context: 16 total (5 mem, 6 wrk, 5 sum)
   ↳ 2 new (1 wrk, 1 sum), 8 seen | 1 low | 820/2,000 tok

Line 1: session-wide picture. Line 2: what happened this turn.

All-dedup turn (nothing new to inject):

📦 Context: 14 total (4 mem, 5 wrk, 5 sum)
   ↳ 0 new, 8 seen | 0 low | 0/2,000 tok

When semantic search is down:

📦 Context: 8 total (3 mem, 5 wrk) ⚠ kw-fallback
   ↳ 8 new (3 mem, 5 wrk), 0 seen | 2 low | 4,200/5,000 tok

Structure stays identical across all turns — only numbers change. Abbreviations keep it compact: mem, wrk, sum, lrn, fail, arc, prj.

What the numbers mean

total — cumulative unique files tracked in the session manifest
new — injected this turn (with type breakdown)
seen — dedup-skipped (already in context from prior turns)
low — candidates that scored below minAssemblyScore threshold
tok — tokens used / budget

The total comes from the dedup manifest, so it accurately reflects the full session state. Type breakdown is inferred from file paths.

Gist updated with both files.

0 replies

jlacour-git · 2026-03-16T12:19:05Z

jlacour-git
Mar 16, 2026
Author

Update: Sub-File Chunking — Section-Level Precision in Semantic Search

After running the normalization fix and compaction detection improvements for a few days, I evaluated MemSearch (Zilliz's new open-source memory system for AI agents, extracted from the MCP ecosystem). Not a replacement for what we've built here, but it surfaced one idea that turned out to be the biggest improvement yet: sub-file chunking.

The Problem

The semantic index was embedding whole files — one vector per file. A 400-line PRD with sections like ## Context, ## Criteria, ## Decisions got a single averaged embedding. When you search for "rule adherence findings," the specific ## Decisions section can't surface independently. The whole file either matches vaguely or doesn't match at all.

The Fix

Split markdown files at heading boundaries (^#{1,6}\s) before embedding. Each section becomes its own vector in LanceDB.

A few details that matter:

Preamble handling: Content before the first heading becomes its own chunk
Oversized sections: Anything over 4000 chars gets split at paragraph breaks (\n\n+)
Empty sections: Heading-only chunks (no content after the heading) are skipped
Content-hash dedup: SHA-256 per chunk. On incremental rebuilds, unchanged chunks aren't re-embedded. This makes rebuilds fast even with 3500+ vectors
Backward compatible: Files without headings still get indexed as a single chunk

LanceDB schema gained three columns: heading (string), contentHash (string), chunkIndex (number).

Results

Metric	Before	After
Vectors in index	634	3,518
Files processed	661	661
Avg chunks/file	1.0	5.3
Build time (full)	~40s	~64s
Incremental (no changes)	~8s	~12s

Search results now show heading-level labels: PRD.md (Decisions), science-output.md (H4: Rule Placement...) instead of just filenames. The ContextAssembler uses the best chunk score for file-level ranking, then labels candidates with which section matched.

The 500-Character Flaw

Here's the part that stung. While implementing chunking, I asked a simple question about what actually gets injected into the AI's context. The answer: contentBody.substring(0, 500).

The system could now identify that ## Decisions on line 80 was the most relevant section. Then it threw that away and showed the first 500 characters of the file. Usually the ## Context preamble. The exact wrong part.

This line had been in the code since day one. My AI assistant had modified the surrounding code multiple times over three weeks. I never reviewed that specific line. Neither of us questioned it because substring(0, 500) is correct code. It does exactly what it says. It just does the wrong thing.

The fix was obvious once we saw it: when a specific section matched, extract and inject that section's content. Token counts dropped dramatically. One file went from 2,445 tokens down to 197. Budget now fits 18 candidates instead of 12.

The question that found it wasn't "is this function correct?" It was "what actually gets shown to the AI, and is it the right thing?" A question about purpose, not implementation. That's the kind of question humans need to keep asking when working with AI on systems — the code-level view was fine, the system-level view was broken.

Updated Code

The ContextAssembler gist has been updated with both files:

BuildSemanticIndex.ts — chunking, content-hash dedup, heading extraction
ContextAssembler.ts — section-level injection, chunk-aware scoring

Credit to MemSearch for the chunking idea. Their approach (dense + BM25 hybrid via RRF) is interesting but heavier than what we need here.

0 replies

jarednwilson · 2026-03-16T18:10:08Z

jarednwilson
Mar 16, 2026

This is really cool jlacour. I don't have anything meaningful to contribute here (you're way more technical than me), other than affirmation of the the problem you are trying to fix. I have noticed this same context problem in my own use of PAI and I've been keeping up with your discussion thread here as part of my research on how to fix it. You are very brilliant.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ContextAssembler — Dynamic Per-Turn Context Retrieval #867

Uh oh!

{{title}}

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

ContextAssembler — Dynamic Per-Turn Context Retrieval #867

Uh oh!

jlacour-git Mar 2, 2026

The Problem

What We Built

Architecture

How It Works

Configuration

What It Looks Like in Practice

Field Results

Remaining Portability Considerations

Proposal

Questions for the Community

Replies: 7 comments

Uh oh!

jlacour-git Mar 3, 2026 Author

Uh oh!

jlacour-git Mar 11, 2026 Author

Update: ContextAssembler Dedup and Prompt Caching Alignment

How caching interacts with per-turn context injection

Practical implication for anyone building similar hooks

Uh oh!

jlacour-git Mar 12, 2026 Author

Update: Semantic Search — Normalization Bug Found & Fixed

1. Units mismatch in normalization constants

2. Index pollution — 22% boilerplate

Results

Lesson learned

Updated code

Uh oh!

jlacour-git Mar 16, 2026 Author

Update: Compaction Detection — Flag File Replaces CLAUDE.md Counting

The Bug

The Fix

Impact

Uh oh!

jlacour-git Mar 16, 2026 Author

Update: Observability — Two-Line Status Display

The Problem

The Fix

What the numbers mean

Uh oh!

jlacour-git Mar 16, 2026 Author

Update: Sub-File Chunking — Section-Level Precision in Semantic Search

The Problem

The Fix

Results

The 500-Character Flaw

Updated Code

Uh oh!

jarednwilson Mar 16, 2026

jlacour-git
Mar 2, 2026

jlacour-git
Mar 3, 2026
Author

jlacour-git
Mar 11, 2026
Author

jlacour-git
Mar 12, 2026
Author

jlacour-git
Mar 16, 2026
Author

jlacour-git
Mar 16, 2026
Author

jlacour-git
Mar 16, 2026
Author

jarednwilson
Mar 16, 2026