The Session API provides persistent conversation storage for agents, with tree-structured messages, context blocks, compaction, full-text search, and AI-controllable tools. It runs entirely on Durable Object SQLite — no external database needed.
Experimental. The Session API is under
agents/experimental/memory/session. The API surface is stable but may evolve before graduating to the main package.
import { Agent } from "agents";
import { Session } from "agents/experimental/memory/session";
class MyAgent extends Agent {
session = Session.create(this)
.withContext("soul", {
provider: { get: async () => "You are a helpful assistant." }
})
.withContext("memory", {
description: "Learned facts about the user",
maxTokens: 1100
})
.withCachedPrompt();
async onMessage(message) {
await this.session.appendMessage(message);
const history = this.session.getHistory();
const system = await this.session.freezeSystemPrompt();
const tools = await this.session.tools();
// Pass history, system prompt, and tools to your LLM
}
}Session manages a single conversation's messages, context blocks, and compaction state.
There are two ways to create a Session:
Builder API (recommended) — uses Session.create(agent) with a chainable builder. Context providers without an explicit provider option are auto-wired to SQLite.
const session = Session.create(this)
.withContext("soul", { provider: { get: async () => "You are helpful." } })
.withContext("memory", { description: "Learned facts", maxTokens: 1100 })
.withCachedPrompt()
.onCompaction(myCompactFn)
.compactAfter(100_000);Direct constructor — takes a SessionProvider and options directly. Used when you want full control over providers.
import {
AgentSessionProvider,
AgentContextProvider
} from "agents/experimental/memory/session";
const session = new Session(new AgentSessionProvider(this), {
context: [
{
label: "memory",
description: "Notes",
maxTokens: 500,
provider: new AgentContextProvider(this, "memory")
},
{ label: "soul", provider: { get: async () => "You are helpful." } }
]
});All builder methods return this for chaining. Order doesn't matter — providers are resolved lazily on first use.
| Method | Description |
|---|---|
Session.create(agent) |
Static factory. agent is any object with a sql tagged template method (i.e. your Agent/DO). |
.forSession(sessionId) |
Namespace this session by ID. Required for multi-session isolation when not using SessionManager. Context provider keys and storage are scoped to this ID. |
.withContext(label, options?) |
Add a context block. See Context Blocks. |
.withCachedPrompt(provider?) |
Enable system prompt persistence. The prompt is frozen on first use and survives DO hibernation/eviction. Without an explicit provider, auto-wires to SQLite. |
.onCompaction(fn) |
Register a compaction function. See Compaction. |
.compactAfter(tokenThreshold) |
Auto-compact when estimated token count exceeds the threshold. Checked after each appendMessage(). Requires .onCompaction(). |
Messages use the SessionMessage type — a minimal shape with id, role, parts, and optional createdAt. The Vercel AI SDK's UIMessage is structurally compatible and can be passed directly without conversion. The session stores messages in a tree structure via parent_id, enabling branching conversations.
// Append — auto-parents to the latest leaf unless parentId is specified
await session.appendMessage(message);
await session.appendMessage(message, parentId);
// Update an existing message (matched by message.id)
session.updateMessage(message);
// Delete specific messages
session.deleteMessages(["msg-1", "msg-2"]);
// Clear all messages and skill state
session.clearMessages();Note:
appendMessage()isasyncbecause it may trigger auto-compaction. The underlying storage write is synchronous (SQLite), but the compaction step involves an LLM call. All other write methods (updateMessage,deleteMessages,clearMessages) are synchronous.
// Linear history from root to the latest leaf
const messages = session.getHistory();
// History to a specific leaf (for branching)
const branch = session.getHistory(leafId);
// Get a single message
const msg = session.getMessage("msg-1");
// Get the newest message
const latest = session.getLatestLeaf();
// Count messages in path
const count = session.getPathLength();Messages form a tree. When you appendMessage with a parentId that already has children, you create a branch. Use getBranches() to get all child messages branching from a given point:
// Get all child messages that branch from messageId (e.g. multiple responses to a user message)
const branches = session.getBranches(messageId);This powers features like response regeneration — pass the user message ID to get both the original and regenerated responses. getHistory(leafId) walks the chosen path.
Full-text search over the conversation history using SQLite FTS5:
const results = session.search("deployment Friday", { limit: 10 });
// Returns: Array<{ id, role, content, createdAt? }>Uses porter stemming and unicode tokenization. The search covers all messages in the session.
Note:
search()throws if the session provider doesn't support search. The built-inAgentSessionProvidersupports it.
When the Session's agent object has a broadcast() method (all Agent subclasses do), the Session automatically broadcasts status events over WebSocket after each write operation:
CF_AGENT_SESSION— phase ("idle"or"compacting"),tokenEstimate,tokenThresholdCF_AGENT_SESSION_ERROR— emitted on compaction failure
This allows connected clients to display real-time token usage and compaction status.
Context blocks are persistent key-value sections injected into the system prompt. Each block has a label, optional description, and a provider that determines its behavior.
There are four provider types, detected by duck-typing:
| Provider | Interface | Behavior | AI Tool |
|---|---|---|---|
| ContextProvider | get() |
Read-only block in system prompt | — |
| WritableContextProvider | get() + set() |
Writable via AI | set_context |
| SkillProvider | get() + load() + set?() |
On-demand keyed documents. get() returns a metadata listing; load(key) fetches full content. |
load_context, unload_context, set_context |
| SearchProvider | get() + search() + set?() |
Full-text searchable entries. get() returns a summary; search(query) runs FTS5. |
search_context, set_context |
All providers also support an optional init(label) method, called before first use with the block's label.
AgentContextProvider — SQLite-backed writable context. This is what you get by default when using the builder without an explicit provider.
import { AgentContextProvider } from "agents/experimental/memory/session";
// Explicit usage — key determines the SQLite row
new AgentContextProvider(this, "memory");R2SkillProvider — Cloudflare R2 bucket for on-demand document loading. Skills are listed in the system prompt as metadata; the model loads full content on demand via load_context.
import { R2SkillProvider } from "agents/experimental/memory/session";
Session.create(this).withContext("skills", {
provider: new R2SkillProvider(env.SKILLS_BUCKET, { prefix: "skills/" })
});Descriptions are stored in R2 custom metadata (description key).
AgentSearchProvider — SQLite FTS5 searchable context. Entries are indexed and searchable by the model via search_context.
import { AgentSearchProvider } from "agents/experimental/memory/session";
Session.create(this).withContext("knowledge", {
description: "Searchable knowledge base",
provider: new AgentSearchProvider(this)
});Blocks can be added and removed dynamically after initialization — useful for extensions:
// Add a new block (auto-wires to SQLite if no provider given)
await session.addContext("extension-notes", {
description: "From extension X",
maxTokens: 500
});
// Remove it
session.removeContext("extension-notes");
// Rebuild the system prompt to reflect changes
await session.refreshSystemPrompt();Note:
addContextandremoveContextdo NOT automatically update the frozen system prompt. You must callrefreshSystemPrompt()afterward.
// Single block
const block = session.getContextBlock("memory");
// block: { label, description?, content, tokens, maxTokens?, writable, isSkill, isSearchable }
// All blocks
const blocks = session.getContextBlocks();// Replace content entirely
await session.replaceContextBlock("memory", "User likes coffee.");
// Append content
await session.appendContextBlock("memory", "\nUser prefers dark roast.");Note: Writing to a context block updates the provider immediately but does NOT update the frozen system prompt snapshot. This is intentional — it preserves the LLM prefix cache. Call
refreshSystemPrompt()when you want changes reflected in the prompt.
The system prompt is built from all context blocks with headers and metadata:
══════════════════════════════════════════════
SOUL (Identity) [readonly]
══════════════════════════════════════════════
You are a helpful assistant.
══════════════════════════════════════════════
MEMORY (Learned facts) [45% — 495/1100 tokens]
══════════════════════════════════════════════
User likes coffee.
User prefers dark roast.
// Freeze — first call renders and persists, subsequent calls return the cached value
const prompt = await session.freezeSystemPrompt();
// Refresh — re-render from current block state and persist
const updated = await session.refreshSystemPrompt();The frozen prompt survives DO hibernation and eviction when withCachedPrompt() is enabled. After eviction, the next freezeSystemPrompt() call loads from SQLite rather than re-rendering.
Skills are on-demand documents stored in a SkillProvider (e.g. R2). The model sees a metadata listing in the system prompt and can load full content on demand:
// Unload a skill to free context space (rewrites the tool result in history)
session.unloadSkill("skills", "api-reference");
// Check what's currently loaded
const loaded = session.getLoadedSkillKeys(); // Set<"skills:api-reference">After hibernation/eviction, loaded skills are reconstructed by scanning conversation history for load_context tool results. This means skill state survives restarts without additional storage.
Weird: The skill restoration scans the entire conversation history looking for
load_contexttool invocations in assistant messages withstate: "output-available". When you unload a skill, it doesn't delete the tool result — it rewrites theoutputfield to"[skill unloaded: key]"in-place. This means the original loaded content is permanently lost from history after unload.
Session automatically generates tools based on the provider types of your context blocks. Pass these to your LLM alongside your own tools.
const tools = await session.tools();
// Merge with your own tools:
const allTools = { ...tools, ...myTools };Generated when any writable block exists. Writes to regular blocks, skill blocks (keyed), or search blocks (keyed).
- For regular blocks:
{ label, content, action: "replace" | "append" } - For skill blocks:
{ label, key, content, description? } - For search blocks:
{ label, key, content }
Enforces maxTokens limits. Returns a usage string like "Written to memory. Usage: 45% (495/1100 tokens)".
Generated when any skill block exists. Loads full content by key from a SkillProvider.
- Input:
{ label, key } - Returns the document content, or
"Not found: key"
Generated alongside load_context. Frees context space by unloading a previously loaded skill.
- Input:
{ label, key } - Rewrites the tool result in conversation history to a short marker
- The skill remains available for re-loading
The tool's description dynamically lists currently loaded skills.
Generated when any search block exists. Full-text search within a searchable context block.
- Input:
{ label, query } - Returns top 10 results by FTS5 rank, or
"No results found."
Available on SessionManager only (not on individual sessions). Searches across all sessions.
- Input:
{ query } - Returns results from all sessions, or
"No results found."
Use { ...sessionTools, ...manager.tools() } to give the model both per-session and cross-session tools.
Compaction summarizes older messages to keep conversations within token limits. Original messages are preserved in SQLite — the summary is a non-destructive overlay applied at read time.
import { createCompactFunction } from "agents/experimental/memory/utils/compaction-helpers";
const session = Session.create(this)
.withContext("memory", { maxTokens: 1100 })
.onCompaction(
createCompactFunction({
summarize: (prompt) =>
generateText({ model: myModel, prompt }).then((r) => r.text),
protectHead: 3, // Keep first 3 messages (default: 3)
tailTokenBudget: 20000, // Protect ~20K tokens at the tail (default: 20000)
minTailMessages: 2 // Always keep at least 2 tail messages (default: 2)
})
)
.compactAfter(100_000); // Auto-compact at 100K estimated tokens- Protect head — first N messages are never compacted (default 3)
- Protect tail — walk backward from the end, accumulating tokens up to a budget (default 20K tokens)
- Align boundaries — shift boundaries to avoid splitting tool call/result pairs
- Summarize middle — send the middle section to an LLM with a structured format (Topic, Key Points, Current State, Open Items)
- Store overlay — saved in
assistant_compactionstable, keyed byfromMessageIdandtoMessageId - Iterative — on subsequent compactions, the existing summary is passed to the LLM to update rather than replace
When getHistory() is called, compaction overlays are applied transparently — the compacted range is replaced by a synthetic message with id compaction_<id>.
// Run registered compaction function
const result = await session.compact();
// Or manage overlays directly
session.addCompaction("Summary of messages 1-50", "msg-1", "msg-50");
const overlays = session.getCompactions();When .compactAfter(threshold) is set, appendMessage() checks the estimated token count after each write. If it exceeds the threshold, compact() is called automatically. Auto-compaction failure is non-fatal — the message is already saved.
Note: Token estimation is heuristic (not tiktoken). It uses
max(chars/4, words*1.3)with 4 tokens per-message overhead. This is intentional — tiktoken would add 80-120MB heap overhead, which exceeds Cloudflare Workers' 128MB limit.
Weird: Compaction is iterative but single-overlay. Each new compaction extends from the earliest existing compaction's
fromMessageIdto the new end. So you always have at most one active compaction overlay per session, and it keeps growing. The previous compaction rows remain in the database but are superseded by the latest one (which covers a wider range).getCompactions()returns all of them, butgetHistory()applies the latest one.
SessionManager is a registry for multiple named sessions within a single Durable Object. It provides lifecycle management, convenience methods, and cross-session search.
import { SessionManager } from "agents/experimental/memory/session";
const manager = SessionManager.create(this)
.withContext("soul", { provider: { get: async () => "You are helpful." } })
.withContext("memory", { description: "Learned facts", maxTokens: 1100 })
.withCachedPrompt()
.onCompaction(myCompactFn)
.compactAfter(100_000)
.withSearchableHistory("history");Context blocks, prompt caching, and compaction settings are propagated to all sessions created through the manager. Provider keys are automatically namespaced by session ID (e.g. memory_<sessionId>).
| Method | Description |
|---|---|
SessionManager.create(agent) |
Static factory. |
.withContext(label, options?) |
Add context block template for all sessions. |
.withCachedPrompt(provider?) |
Enable prompt persistence for all sessions. |
.onCompaction(fn) |
Register compaction function for all sessions. |
.compactAfter(tokenThreshold) |
Auto-compact threshold for all sessions. |
.withSearchableHistory(label) |
Add a cross-session searchable history block to every session. The model can search past conversations from any session. |
// Create a new session
const info = manager.create("My Chat");
// info: { id, name, parent_session_id, model, source, input_tokens, output_tokens, estimated_cost, end_reason, created_at, updated_at }
// Create with metadata
const info2 = manager.create("My Chat", {
parentSessionId: "parent-id",
model: "claude-sonnet-4-20250514",
source: "web"
});
// Get session metadata (null if not found)
const session = manager.get(sessionId);
// List all sessions (ordered by updated_at DESC)
const sessions = manager.list();
// Rename
manager.rename(sessionId, "New Name");
// Delete (clears messages too)
manager.delete(sessionId);// Get or create the Session instance for an ID
// Lazy — creates on first access, caches for subsequent calls
const session = manager.getSession(sessionId);These delegate to the underlying Session but also update the session's updated_at timestamp:
// Append a single message
await manager.append(sessionId, message, parentId?);
// Add or update (upsert)
await manager.upsert(sessionId, message, parentId?);
// Batch append (auto-chains parent IDs)
await manager.appendAll(sessionId, messages, parentId?);
// Read history
const history = manager.getHistory(sessionId, leafId?);
// Message count
const count = manager.getMessageCount(sessionId);
// Clear messages
manager.clearMessages(sessionId);
// Delete specific messages
manager.deleteMessages(sessionId, ["msg-1"]);Fork a session at a specific message — copies history up to that point into a new session:
const forked = await manager.fork(sessionId, atMessageId, "Forked Chat");
// forked.parent_session_id === sessionIdWeird: Fork copies messages with new UUIDs, not the original IDs. This means message IDs in the forked session won't match the original. The fork also doesn't copy compaction overlays — the forked session starts clean with the materialized history.
// Add a compaction overlay
manager.addCompaction(sessionId, summary, fromId, toId);
// Get overlays
const compactions = manager.getCompactions(sessionId);
// Compact and split — marks old session as ended, creates a continuation
const continuation = await manager.compactAndSplit(
sessionId,
summary,
"Continued Chat"
);
// continuation.parent_session_id === sessionId
// Old session gets end_reason = "compaction"compactAndSplit is different from regular compaction — it creates a new session with a summary message instead of an in-place overlay. The original session is marked with end_reason: "compaction".
manager.addUsage(sessionId, inputTokens, outputTokens, cost);
// Increments input_tokens, output_tokens, and estimated_cost on the session row// Search across all sessions (FTS5)
const results = manager.search("deployment Friday", { limit: 20 });
// Returns: Array<{ id, role, content, createdAt }>
// Get tools for the model (includes session_search)
const tools = manager.tools();Note:
manager.search()uses a separate FTS5 index (assistant_fts) from per-session search. Messages are indexed into this table by theAgentSessionProviderwhen appended. Thesession_searchtool limits results to 10.
Weird:
manager.search()silently returns an empty array on FTS5 query errors (malformed queries, etc.) rather than throwing.
All storage is in Durable Object SQLite. Tables are created lazily on first use.
assistant_messages — Tree-structured messages.
| Column | Type | Notes |
|---|---|---|
id |
TEXT PK | Message ID |
session_id |
TEXT | Empty string for single-session; set for multi-session |
parent_id |
TEXT | Parent message ID (null for roots) |
role |
TEXT | user, assistant, system |
content |
TEXT | JSON-serialized SessionMessage |
created_at |
DATETIME | Auto-set |
assistant_compactions — Compaction overlays.
| Column | Type | Notes |
|---|---|---|
id |
TEXT PK | Random UUID |
session_id |
TEXT | Scoped to session |
summary |
TEXT | LLM-generated summary |
from_message_id |
TEXT | Start of compacted range |
to_message_id |
TEXT | End of compacted range |
created_at |
DATETIME | Auto-set |
assistant_fts — FTS5 virtual table for message search. Tokenizer: porter unicode61.
assistant_sessions — Session registry (SessionManager only).
| Column | Type | Notes |
|---|---|---|
id |
TEXT PK | Random UUID |
name |
TEXT | Display name |
parent_session_id |
TEXT | For forks/splits |
model |
TEXT | Optional model identifier |
source |
TEXT | Optional source identifier |
input_tokens |
INTEGER | Cumulative input tokens |
output_tokens |
INTEGER | Cumulative output tokens |
estimated_cost |
REAL | Cumulative cost |
end_reason |
TEXT | "compaction" when split |
created_at |
DATETIME | Auto-set |
updated_at |
DATETIME | Updated on message ops |
cf_agents_context_blocks — Persistent context block storage (AgentContextProvider).
cf_agents_search_entries + cf_agents_search_fts — Searchable context entries and FTS5 index (AgentSearchProvider).
You can implement any of the four provider interfaces to plug in your own storage:
// Read-only context
const myProvider: ContextProvider = {
get: async () => "Static content here"
};
// Writable context (enables set_context tool)
const myWritable: WritableContextProvider = {
get: async () => fetchFromMyDB(),
set: async (content) => saveToMyDB(content)
};
// Skill provider (enables load_context tool)
const mySkills: SkillProvider = {
get: async () => "- api-ref: API Reference\n- guide: User Guide",
load: async (key) => fetchDocument(key),
set: async (key, content, description) =>
saveDocument(key, content, description) // optional
};
// Search provider (enables search_context tool)
const mySearch: SearchProvider = {
get: async () => "42 entries indexed",
search: async (query) => searchMyIndex(query),
set: async (key, content) => indexContent(key, content) // optional
};You can also implement SessionProvider to replace the SQLite storage entirely:
const myStorage: SessionProvider = {
getMessage(id) { ... },
getHistory(leafId?) { ... },
getLatestLeaf() { ... },
getBranches(messageId) { ... },
getPathLength(leafId?) { ... },
appendMessage(message, parentId?) { ... },
updateMessage(message) { ... },
deleteMessages(messageIds) { ... },
clearMessages() { ... },
addCompaction(summary, fromId, toId) { ... },
getCompactions() { ... },
searchMessages(query, limit) { ... } // optional
};Exported from agents/experimental/memory/utils:
import {
estimateStringTokens,
estimateMessageTokens
} from "agents/experimental/memory/utils/tokens";
estimateStringTokens("Hello world"); // heuristic: max(chars/4, words*1.3)
estimateMessageTokens(messages); // sum with 4 tokens per-message overheadimport {
createCompactFunction,
isCompactionMessage,
sanitizeToolPairs,
alignBoundaryForward,
alignBoundaryBackward,
findTailCutByTokens,
computeSummaryBudget,
buildSummaryPrompt,
COMPACTION_PREFIX
} from "agents/experimental/memory/utils/compaction-helpers";createCompactFunction(options)— Full compaction implementation. See Compaction.isCompactionMessage(msg)— Check if a message is a compaction overlay (id starts withcompaction_).sanitizeToolPairs(messages)— Fix orphaned tool call/result pairs after compaction. Removes orphaned results and adds stub results for calls whose results were dropped.alignBoundaryForward/Backward(messages, idx)— Shift a boundary index to avoid splitting tool call/result groups.findTailCutByTokens(messages, headEnd, budget, minMessages)— Find where to stop compressing using a token budget.computeSummaryBudget(messages)— 20% of compressed content tokens (minimum 100).buildSummaryPrompt(messages, previousSummary, budget)— Structured prompt for LLM summarization.
Everything is exported from agents/experimental/memory/session:
import {
// Core
Session,
SessionManager,
// Providers
AgentSessionProvider,
AgentContextProvider,
AgentSearchProvider,
R2SkillProvider,
// Type guards
isWritableProvider,
isSkillProvider,
isSearchProvider,
// Types
type SessionMessage,
type SessionMessagePart,
type SessionContextOptions,
type SessionInfo,
type SessionManagerOptions,
type SessionOptions,
type ContextBlock,
type ContextConfig,
type ContextProvider,
type WritableContextProvider,
type SkillProvider,
type SearchProvider,
type SearchResult,
type SessionProvider,
type StoredCompaction,
type SqlProvider
} from "agents/experimental/memory/session";Compaction utilities from agents/experimental/memory/utils/compaction-helpers:
import {
createCompactFunction,
isCompactionMessage,
sanitizeToolPairs,
COMPACTION_PREFIX,
type CompactResult,
type CompactOptions
} from "agents/experimental/memory/utils/compaction-helpers";Token utilities from agents/experimental/memory/utils/tokens:
import {
estimateStringTokens,
estimateMessageTokens
} from "agents/experimental/memory/utils/tokens";Things that might surprise you:
-
Lazy initialization. Sessions created with the builder don't initialize until first use. The first call to any method (e.g.
getHistory()) triggers_ensureReady(), which creates SQLite tables, resolves providers, loads context blocks, and restores skill state from history. This means the first operation is slower than subsequent ones. -
Snapshot freezing is sticky.
freezeSystemPrompt()caches the result. Writing to a context block does NOT update the cached snapshot — you must explicitly callrefreshSystemPrompt(). This is deliberate (LLM prefix cache optimization), but easy to miss. -
appendMessageis async, other writes are sync.appendMessageis async only because it may trigger auto-compaction (which calls an LLM). The actual SQLite write is synchronous.updateMessage,deleteMessages, andclearMessagesare all synchronous. -
Skills survive hibernation via history scanning. On initialization, the session scans the entire conversation history looking for
load_contexttool results to reconstruct which skills are loaded. This is clever but means initialization cost scales with conversation length. -
Compaction overlays are superseding, not stacking. Each compaction extends from the earliest existing
fromMessageId. So you always have one effective overlay that keeps growing. Old compaction rows remain in the database but are unused.getCompactions()returns all rows, which can be confusing. -
Search is silently absent.
session.search()throws if the provider doesn't support search, butmanager.search()swallows FTS5 errors and returns[]. ThesearchMessagesmethod onSessionProvideris optional (searchMessages?). -
Fork copies with new IDs. When forking via
SessionManager.fork(), all messages get new UUIDs. If you're storing message IDs externally (e.g. for bookmarks), they won't survive a fork. -
removeContextdoesn't fire skill unload callbacks. If you remove a context block that had loaded skills, the skill tracking is cleaned up but the conversation history is NOT rewritten. The tool results from those skills remain in history with their full content. -
FTS5 query sanitization. Both
AgentSearchProvider.search()andSessionManager.search()quote individual words to prevent FTS5 syntax injection. This means you can't use FTS5 operators likeOR,NOT, orNEAR— they'll be treated as literal search terms. -
Auto-compaction failure is silent. When
compactAftertriggers and the compaction function throws, the error is emitted via WebSocket broadcast but theappendMessagecall still succeeds. The message is saved; only the compaction is skipped.