Your AI forgets everything when the conversation ends. soul.py fixes that.
📖 NEW: The book is out! Soul: Building AI Agents That Remember Who They Are — everything here + deep dives on identity, memory patterns, multi-agent coordination, and the philosophy of persistent AI. Get it on Amazon →
from hybrid_agent import HybridAgent
agent = HybridAgent()
agent.ask("My name is Prahlad and I'm building an AI research lab.")
# New process. New session. Memory persists.
agent = HybridAgent()
result = agent.ask("What do you know about me?")
print(result["answer"])
# → "You're Prahlad, building an AI research lab."No database. No server. Just markdown files and smart retrieval.
| Version | Demo | What it shows |
|---|---|---|
| v0.1 | soul.themenonlab.com | Memory persists across sessions |
| v1.0 | soulv1.themenonlab.com | Semantic RAG retrieval |
| v2.0 | soulv2.themenonlab.com | Auto query routing: RAG + RLM |
| Ask Darwin | soul-book.themenonlab.com | 📖 Book companion — watch routing decisions live |
Soul: Building AI Agents That Remember Who They Are
The complete guide to persistent AI memory. Covers:
- Why agents forget (and the architectural fix)
- Identity vs Memory (SOUL.md vs MEMORY.md)
- RAG vs RLM (when to use each)
- Multi-agent memory sharing
- Darwinian evolution of agent identity
- Working code in every chapter
pip install soul-agent
pip install soul-agent[anthropic]
pip install soul-agent[openai]
pip install soul-agent[gemini] # ✅ Now available!soul init # creates SOUL.md and MEMORY.md# v0.1 — simple markdown memory (great starting point)
from soul import Agent
agent = Agent(provider="anthropic")
agent.ask("Remember this.")
# v2.0 — automatic RAG + RLM routing (this repo's default)
from hybrid_agent import HybridAgent
agent = HybridAgent() # auto-detects best retrieval per query
result = agent.ask("What do you know about me?")
print(result["answer"])
print(result["route"]) # "RAG" or "RLM"soul.py works with any LLM provider — no SDK lock-in:
# Anthropic (default)
agent = HybridAgent(provider="anthropic") # Uses ANTHROPIC_API_KEY
# Google Gemini
agent = HybridAgent(
provider="gemini",
chat_model="gemini-2.5-pro", # or gemini-2.0-flash, gemini-2.5-flash
router_model="gemini-2.0-flash", # keep router cheap
) # Uses GEMINI_API_KEY
# OpenAI
agent = HybridAgent(provider="openai") # Uses OPENAI_API_KEY
# Local via Ollama
agent = HybridAgent(
provider="openai-compatible",
base_url="http://localhost:11434/v1",
chat_model="llama3.2",
)| Provider | Default Model | Env Var |
|---|---|---|
anthropic |
claude-haiku-4-5 | ANTHROPIC_API_KEY |
gemini |
gemini-2.0-flash | GEMINI_API_KEY |
openai |
gpt-4o-mini | OPENAI_API_KEY |
openai-compatible |
llama3.2 | OPENAI_API_KEY (optional) |
Don't want to manage local files? SoulMate API gives you persistent memory as a service:
from soulmate import SoulMateClient
# Sign up at soulmate-api.themenonlab.com/docs
client = SoulMateClient(
api_key="sm_live_...",
anthropic_key="sk-ant-..." # BYOK — your own Anthropic key
)
# That's it. Memory persists in the cloud.
response = client.ask("My name is Prahlad.")
response = client.ask("What's my name?") # → "Prahlad"| Local (soul.py) | Cloud (SoulMate API) |
|---|---|
| Files on your machine | Managed cloud storage |
| You control everything | Zero infrastructure |
| Git-versioned memory | API-based, instant setup |
| Free forever | Free tier available |
Get started: soulmate-api.themenonlab.com/docs
soul.py uses two markdown files as persistent state:
| File | Purpose |
|---|---|
SOUL.md |
Identity — who the agent is, how it behaves |
MEMORY.md |
Memory — timestamped log of every exchange |
v2.0 adds a query router that automatically dispatches to the right retrieval strategy:
Your query
↓
Router (fast LLM call)
├── FOCUSED (~90%) → RAG — vector search, sub-second
└── EXHAUSTIVE (~10%) → RLM — recursive synthesis, thorough
Architecture based on: RAG + RLM: The Complete Knowledge Base Architecture
| Branch | Description | Best for |
|---|---|---|
main |
v2.0 — RAG + RLM hybrid (default) | Production use |
v2.0-rag-rlm |
Same as main, versioned | Pinning to v2 |
v1.0-rag |
RAG only, no RLM | Simpler setup |
v0.1-stable |
Pure markdown, zero deps | Learning / prototyping |
result = agent.ask("What is my name?")
result["answer"] # the response
result["route"] # "RAG" or "RLM"
result["router_ms"] # router latency
result["retrieval_ms"] # retrieval latency
result["total_ms"] # total latency
result["rag_context"] # retrieved chunks (RAG path)
result["rlm_meta"] # chunk stats (RLM path)agent = HybridAgent(
soul_path="SOUL.md",
memory_path="MEMORY.md",
mode="auto", # "auto" | "rag" | "rlm"
qdrant_url="...", # or set QDRANT_URL env var
qdrant_api_key="...", # or QDRANT_API_KEY
azure_embedding_endpoint="...", # or AZURE_EMBEDDING_ENDPOINT
azure_embedding_key="...", # or AZURE_EMBEDDING_KEY
k=5, # RAG retrieval count
)Falls back to BM25 (keyword) if Qdrant/Azure not configured.
soul.py isn't just for personal memory — the same architecture works for custom knowledge bases. Combine both in a single agent:
agent = HybridAgent(
soul_path="SOUL.md",
memory_path="MEMORY.md", # Per-user memory
knowledge_dir="./knowledge", # Your corpus (docs, products, policies)
)
# Index your knowledge base once
agent.index_knowledge()
# Now the agent searches both pools
agent.ask("What's the return policy?") # → Knowledge base
agent.ask("What was I asking about earlier?") # → User memory
agent.ask("Which product fits my needs?") # → BothExample use cases:
| Agent Type | Knowledge Base | Memory |
|---|---|---|
| Support Bot | Product docs, policies, FAQs | Customer history, preferences |
| Research Assistant | Paper corpus, methodologies | User's focus, papers read |
| Onboarding Buddy | Company handbook, org chart | New hire's role, questions |
| Book Companion | Full book content | Reader's interests, progress |
Darwin (the AI companion for the Soul book) uses exactly this pattern — the entire book indexed as knowledge, plus per-reader conversation memory.
See the Memory Architecture Patterns guide for detailed implementation patterns.
Already using a framework? Drop in soul.py memory with one line:
| Framework | Package | Install |
|---|---|---|
| LangChain | langchain-soul | pip install langchain-soul |
| LlamaIndex | llamaindex-soul | pip install llamaindex-soul |
| CrewAI | crewai-soul | pip install crewai-soul |
# LangChain
from langchain_soul import SoulChatMessageHistory
history = SoulChatMessageHistory(session_id="user-123")
# LlamaIndex
from llamaindex_soul import SoulChatStore
chat_store = SoulChatStore()
# CrewAI
from crewai_soul import SoulMemory
memory = SoulMemory()Each integration includes:
- soul-agent — RAG + RLM hybrid retrieval
- soul-schema — Database semantic layer (auto-document your tables)
- SoulMate client — Managed cloud option
Those are orchestration frameworks. soul.py is a primitive — persistent identity and memory you can drop into anything you're building.
- No framework lock-in — works with any LLM provider, or with your favorite framework via integrations above
- Human-readable — SOUL.md and MEMORY.md are plain text
- Version-controllable — git diff your agent's memories
- Composable — use just the parts you need
See ROADMAP.md for planned features and how to contribute.
MIT
@software{menon2026soul,
author = {Menon, Prahlad G.},
title = {soul.py: Persistent Identity and Memory for LLM Agents},
year = {2026},
url = {https://github.com/menonpg/soul.py}
}