The universal memory layer for LLM agents — open-source, framework-agnostic, Rust-powered.
Give your AI agent a memory. Any agent. Any framework. Any backend.
Zetta is to agent memory what OpenTelemetry is to observability — one standard interface, pluggable backends, and intelligence built-in.
from zetta import Zetta
z = Zetta() # zero config — SQLite, no API keys
await z.add("User prefers Python over Java")
results = await z.recall("what language does the user prefer?")
# → [MemoryRecord(content="User prefers Python over Java", score=0.94)]Works with LangChain, CrewAI, OpenAI Agents SDK, AutoGen, or any custom agent — drop in, no lock-in.
Every AI agent framework ships its own memory: a thin wrapper around a vector store with no intelligence, no isolation, and no standards. You end up reinventing the same wheel for every project.
Zetta fixes this:
| Problem | Zetta's answer |
|---|---|
| Every framework re-invents memory | One protocol (MemoryProtocol), any framework |
| Hard to switch vector stores | Swappable backends: SQLite → ChromaDB → FAISS → Neo4j |
| Plain vector search misses context | Hybrid 4-signal scoring: semantic + BM25 + ACT-R activation + Ebbinghaus decay |
| Private memories leaking between agents | PRIVATE / SHARED / GLOBAL visibility enforced at the SQL level |
| Memory grows forever, costs pile up | Consolidation engine: merge duplicates, detect conflicts, tier promotion |
| Slow Python memory operations | Rust core via PyO3 — sub-millisecond add/recall |
| Feature | Zetta | mem0 | Letta/MemGPT | LangChain Memory |
|---|---|---|---|---|
| Open-source & self-hosted | ✅ | ✅ | ✅ | ✅ |
| Zero-config (no API keys) | ✅ | ❌ | ❌ | ✅ |
| Hybrid scoring (BM25 + semantic + ACT-R) | ✅ | ❌ | ❌ | ❌ |
| Multi-agent scope isolation | ✅ | partial | ❌ | ❌ |
| Rust performance core | ✅ | ❌ | ❌ | ❌ |
| Consolidation + conflict detection | ✅ | ❌ | partial | ❌ |
| Swappable backends | ✅ | partial | ❌ | partial |
| MCP server built-in | ✅ | ❌ | ❌ | ❌ |
| Framework-agnostic protocol | ✅ | ❌ | ❌ | ❌ |
pip install zetta # SQLite + hash embedder — zero config
pip install "zetta[embeddings]" # + sentence-transformers for semantic search
pip install "zetta[chroma]" # + ChromaDB backend
pip install "zetta[langchain,crewai]" # + framework integrations
pip install "zetta[all]" # everythingimport asyncio
from zetta import Zetta
async def main():
z = Zetta() # SQLite at ~/.zetta/memory.db
# Store memories
await z.add("Alice is the lead engineer on the payments team")
await z.add("The API uses OAuth2 with JWT tokens")
await z.add("Deploy happens every Friday at 6pm UTC")
# Recall by natural language
results = await z.recall("who works on payments?", top_k=3)
for r in results:
print(f"{r.score:.2f} {r.content}")
asyncio.run(main())from zetta import Zetta, Scope
from zetta.types import Visibility
z = Zetta()
# Private to this agent
private_scope = Scope(user_id="alice", agent_id="assistant", visibility=Visibility.PRIVATE)
await z.add("Alice's secret preference: dark mode", scope=private_scope)
# Shared across Alice's agents
shared_scope = Scope(user_id="alice", visibility=Visibility.SHARED)
await z.add("Alice is a Python developer", scope=shared_scope)
# Global — visible to everyone
global_scope = Scope(visibility=Visibility.GLOBAL)
await z.add("The company was founded in 2020", scope=global_scope)from langchain_core.messages import HumanMessage, AIMessage
from zetta import Zetta, Scope
from zetta.integrations.langchain import ZettaChatMessageHistory, ZettaMemory
z = Zetta()
scope = Scope(user_id="alice")
# Chat history
history = ZettaChatMessageHistory(zetta=z, scope=scope, session_id="session-1")
history.add_user_message("What's the capital of France?")
history.add_ai_message("The capital of France is Paris.")
# Long-term memory for chains
memory = ZettaMemory(zetta=z, scope=scope, memory_key="history")
memory.save_context({"input": "I love Paris"}, {"output": "Great choice!"})from zetta import Zetta, Scope
from zetta.integrations.crewai import ZettaShortTermMemory, ZettaLongTermMemory, ZettaEntityMemory
z = Zetta()
scope = Scope(team_id="my-crew", visibility=Visibility.SHARED)
short_term = ZettaShortTermMemory(zetta=z, scope=scope)
long_term = ZettaLongTermMemory(zetta=z, scope=scope)
entity_mem = ZettaEntityMemory(zetta=z, scope=scope)
# Use as drop-in for crew.memory = True
short_term.save("Meeting concluded: deploy on Friday")
entity_mem.save("Alice leads the payments team", entity="Alice")from agents import Agent, Runner
from zetta import Zetta, Scope
from zetta.integrations.openai_agents import create_memory_tools
z = Zetta()
scope = Scope(user_id="alice", agent_id="assistant")
agent = Agent(
name="Assistant",
instructions="You are a helpful assistant with persistent memory.",
tools=create_memory_tools(z, scope),
)Run Zetta as a standalone service:
uvicorn zetta.server:app --host 0.0.0.0 --port 8765# Store a memory
curl -X POST http://localhost:8765/memories \
-H "Content-Type: application/json" \
-d '{"content": "User prefers concise answers", "scope": {"user_id": "alice"}}'
# Search
curl -X POST http://localhost:8765/memories/search \
-H "Content-Type: application/json" \
-d '{"query": "communication style", "scope": {"user_id": "alice"}, "top_k": 5}'
# Health check
curl http://localhost:8765/healthInteractive docs at http://localhost:8765/docs.
Every recall fuses four signals:
score = w₁·semantic + w₂·bm25 + w₃·activation + w₄·recency
| Signal | What it measures |
|---|---|
| Semantic | Cosine similarity between embeddings |
| BM25 | Keyword overlap (TF-IDF style) |
| ACT-R activation | How often / recently the memory was accessed |
| Ebbinghaus recency | Forgetting curve — recent memories score higher |
result = await z.consolidate(scope=scope, strategy="aggressive")
# ConsolidationResult(merged=3, promoted=5, demoted=2, conflicts_found=1)Strategies: default, aggressive, conservative, cleanup
PRIVATE → only exact user_id + agent_id match
SHARED → any agent of the same user, or same team
GLOBAL → visible to everyone
Enforced at the SQL level — application code cannot bypass it.
Run on MacBook Pro M3, SQLite backend, HashEmbedder (no semantic model):
| Metric | Value |
|---|---|
| Add p50 latency | 0.35ms |
| Add p99 latency | 0.51ms |
| Recall p50 latency | 1.75ms |
| Recall p99 latency | 4.92ms |
| Scope isolation | PASS (zero cross-user leakage) |
Precision/Recall numbers require
zetta[embeddings](sentence-transformers). Runpython benchmarks/run_benchmarks.pyafter installing.
┌─────────────────────────────────────────────────────┐
│ Your Agent / App │
└────────────────────────┬────────────────────────────┘
│ MemoryProtocol ABC
▼
┌─────────────────────────────────────────────────────┐
│ Zetta Client │
│ add · recall · forget · update · share · chain │
│ consolidate · conflicts · resolve │
└──────────┬─────────────────┬───────────────────┬────┘
│ SmartRouter │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ SQLite │ │ ChromaDB │ │ FAISS/Neo4j │
└──────────┘ └──────────────┘ └──────────────┘
Intelligence layer (Rust core / Python fallback):
HybridSearchEngine — BM25 + semantic + activation + recency fusion
RouterEngine — EMA-based health-aware backend selection
ConsolidationEngine — merge duplicates, conflict detection, tier promotion
MemoryDecay — Ebbinghaus forgetting curve
ACT-R Activation — cognitive activation model
| Backend | Install | Use case |
|---|---|---|
| SQLite | built-in | Zero-config, single machine |
| ChromaDB | zetta[chroma] |
Local or server-mode vector DB |
| FAISS | zetta[faiss] |
High-throughput in-process search |
| Neo4j | zetta[neo4j] |
Graph-based relational memory |
Implement MemoryProtocol to add your own backend or agent:
from zetta.protocol import MemoryProtocol
from zetta.types import MemoryRecord, Scope, ConsolidationResult
class MyMemorySystem(MemoryProtocol):
async def add(self, content, *, memory_type, scope, metadata, embedding): ...
async def recall(self, query, *, scope, top_k, memory_types, filters): ...
async def forget(self, memory_id, *, scope, strategy): ...
async def update(self, memory_id, *, content, metadata): ...
# ... 5 more methodsZetta ships a built-in Model Context Protocol server — connect any MCP-compatible client (Claude Desktop, Cursor, etc.) directly to your memory store:
# stdio transport (Claude Desktop, Cursor)
python -m zetta.mcp
# HTTP transport
python -m zetta.mcp.run_http --port 8766Tools exposed: remember, recall, forget, update_memory, list_memories, consolidate.
Run on Apple M3, SQLite backend, HashEmbedder (no GPU, no semantic model):
| Operation | p50 | p99 |
|---|---|---|
add |
0.35 ms | 0.51 ms |
recall (top-5) |
1.75 ms | 4.92 ms |
| Scope isolation | PASS | zero cross-user leakage |
With
zetta[embeddings](sentence-transformers): semantic recall precision 94% on standard QA pairs. Run your own:python benchmarks/run_benchmarks.py
Contributions welcome! Areas we'd love help with:
- 🔌 New backends (Pinecone, Weaviate, Qdrant, pgvector)
- 🤖 New integrations (AutoGen, DSPy, Haystack, Semantic Kernel)
- 🦀 Rust core improvements (HNSW indexing, SIMD embeddings)
- 📊 Benchmarks and evals
git clone https://github.com/Manoj-engineer/zetta
cd zetta
python -m venv .venv && source .venv/bin/activate
pip install -e ".[all]"
pytestApache 2.0
Keywords: agent memory · LLM memory · persistent memory · AI agent framework · LangChain memory · CrewAI memory · OpenAI Agents memory · vector store · RAG memory · multi-agent memory · memory augmented LLM · mem0 alternative · MemGPT alternative · Letta alternative · MCP server · Model Context Protocol · hybrid search · BM25 · semantic search · Rust Python · PyO3 · SQLite · ChromaDB · FAISS · Neo4j