MNEMOS v4.1.1 is the memory operating system for serious agentic work: a packaged FastAPI runtime, multi-backend persistence layer, GRAEAE reasoning bus, operator-audited compression stack, and CLI-first deployment surface.
MNEMOS is not just a place to put bytes. It is a runtime of named subsystems that manage the full lifecycle of agent memory across providers, agents, and time horizons: write, embed, search, compress, version, reason-over, audit, federate, export, import, and operate.
- MNEMOS is the memory kernel and the overall system name. Storage, versioning, operator-batched compression, portability, and lifecycle sit here.
- GRAEAE is the reasoning bus: a multi-provider routing and consensus layer
with routing-strategy modes (
auto,local,external,all) and reasoning shapes (single,debate,majority). - THE MOIRAI is the compression subsystem. The current built-in stack is APOLLO + ARTEMIS, with a written receipt on every transformation.
- A self-maintaining model registry, when the shipped sync timer is enabled, keeps itself current from provider APIs and Arena.ai Elo rankings.
- Federation, webhooks, OAuth, RLS, DAG versioning, MCP, and
the
/v1/REST surface are services built on top of the kernel, not retrofits onto a vector store.
The v4.1 codebase is a coherent mnemos/ package: api/routes, core, db,
domain, persistence, mcp, webhooks, workers, hooks, installer,
tools, and cli. The old top-level script sprawl is gone; operators use the
single mnemos command.
- A FastAPI service on port 5002 with three deployment profiles:
server,edge, anddev. - A
PersistenceBackendabstraction with Postgres (asyncpg, pgvector, RLS, LISTEN/NOTIFY) and SQLite (aiosqlite, sqlite-vec, FTS5, JSON1, WAL) implementations. - Multi-worker production support when Redis backs shared rate-limit, circuit-breaker, and concurrency state. The in-process fallback remains for single-worker dev and edge installs and logs a warning if used with multiple workers.
- A single
/v1/*REST surface covering memories, consultations, providers, sessions, webhooks, federation, portability, and an OpenAI-compatible chat-completions gateway. - Git-like DAG versioning on memory:
log,branch,merge,revert. Every mutation snapshots; history reads are filtered per snapshot and branch writers refuse cross-memory parent edges. - Operator-batched compression contests through the
CompressionEngineABC, competitive APOLLO/ARTEMIS selection, and persisted winner/loser audit rows. - Single-binary builds for
linux-x86_64,linux-aarch64, andmacos-aarch64with sqlite-vec bundled. - Seven import-linter contracts in CI enforcing the package architecture, plus a Pydantic Settings singleton that replaces ad-hoc environment reads outside the config and installer layers.
- 1055 unit tests passing, with GitLab integration tiers for cross-namespace isolation, multi-worker smoke, and Postgres/SQLite persistence parity.
MNEMOS has been in daily production use since December 2025. The current release line is v4.1.1, shipped on 2026-04-29, and it is the first release where the package layout, persistence layer, deployment profiles, CLI, single-binary distribution, and multi-worker coordination all match the intended production shape.
pip install mnemos-os==4.1.1
mnemos serve --profile devdocker pull ghcr.io/mnemos-os/mnemos:4.1.1For a single binary with no host Python:
curl -L https://github.com/mnemos-os/mnemos/releases/download/v4.1.1/mnemos-linux-x86_64 -o mnemos
chmod +x mnemos
./mnemos install --profile edge
./mnemos serve --profile edgeSee docs/SINGLE_BINARY.md for platform-specific
filenames, macOS quarantine notes, limitations, and build-from-source
instructions.
| Profile | Backend | Coordination | Use it for |
|---|---|---|---|
server |
Postgres + pgvector | Redis-backed shared state | Production, teams, multi-worker services |
edge |
SQLite + sqlite-vec | In-process single-worker state | Laptops, Pi-class hosts, local appliances, Termux-style installs |
dev |
SQLite + sqlite-vec | In-process single-worker state + DEBUG logging | Local development and tests |
MNEMOS is designed to be the memory layer for the agentic tooling you already use — not a replacement for it. We interoperate on purpose, over three mechanisms, so there is no language lock-in and no pressure to rewrite your agent around us.
- MCP (Model Context Protocol). MNEMOS ships stdio and HTTP/SSE MCP transports (
mnemos.mcp.stdio,mnemos.mcp.http) that expose memory operations — search, create, update, delete, DAG versioning, model optimizer — as first-class tool calls. Register it in any MCP-aware client (Claude Code, OpenClaw, ZeroClaw, Hermes) and the agent gets persistent memory without your framework having to know MNEMOS exists at the code level. - OpenAI-compatible gateway.
POST /v1/chat/completionsandGET /v1/modelsare drop-in for the OpenAI SDK. PointOPENAI_BASE_URLat your MNEMOS instance and any client that already speaks OpenAI gets memory injection, multi-provider routing, propagated generation controls (temperature,max_tokens,top_p), streaming SSE, and explicit 400s when the selected provider cannot honor tools, response formats, penalties, or multimodal content. This is the path for LangChain, LlamaIndex, CrewAI, AutoGen, and anything else that was written against the OpenAI wire protocol. - Native
/v1/*REST surface. For integrations that want to speak to MNEMOS directly:/v1/memories,/v1/consultations,/v1/providers,/v1/sessions,/v1/webhooks,/v1/federation,/v1/kg/triples. The full API is language-agnostic; pick your HTTP client and go.
Current MCP tools come from one registry shared by stdio and HTTP/SSE: search_memories, list_memories, get_memory, create_memory, update_memory, delete_memory, bulk_create_memories, get_stats, kg_create_triple, kg_search, kg_timeline, update_triple, delete_triple, log_memory, branch_memory, diff_memory_commits, checkout_memory, and recommend_model.
- Claude Code — drop-in hooks (session-start / user-prompt-submit / stop), skill config, and MCP server. See
integrations/claude-code/. MCP. - OpenClaw — AGENTS.md skill snippet + MCP registration. See
integrations/openclaw/. MCP. - ZeroClaw — memory skill over MCP. Works without adding any Python dependency to ZeroClaw's Rust runtime — memory ops cross the wire to a MNEMOS instance running wherever. See
integrations/zeroclaw/. MCP. - Hermes Agent — optional persistence backend for team / multi-tenant / compliance-regulated Hermes deployments. See
integrations/hermes/. MCP + REST. - MemPalace — graduation path, not a replacement. A portability schema + importer lets a MemPalace user who grows into a team preserve their drawers and palaces rather than start over. See RFC #1112 on MemPalace. REST bulk import.
- Mem0 / Letta / Zep — one-shot bulk consolidation via
POST /v1/memories/bulk. If you already have a running memory store elsewhere and need to converge, MNEMOS is where they converge to. REST bulk import. - LangChain / LlamaIndex — works today via the OpenAI-compatible gateway: point
OPENAI_BASE_URLat MNEMOS and memory injection + multi-provider routing land automatically. OpenAI-compat. - CrewAI / AutoGen — shared memory across agents in a crew / group. Works today via the OpenAI-compatible gateway. OpenAI-compat.
The integrations bundle under integrations/ is the living inventory. New integrations ship as SKILL.md + MCP config + enforcement snippet per framework, plus idempotent install/uninstall scripts where the target framework supports them.
MNEMOS runs as a network service. In the server profile you deploy it alongside PostgreSQL and Redis so every agent in your stack shares the same memory kernel over REST; in the edge profile it runs as an all-in-one SQLite-backed node for laptops, Pi-class systems, and phone-adjacent installs. It is not an in-process helper or a framework you import.
MNEMOS was built out of a very practical frustration: serious agentic systems keep losing context at exactly the moment reliability starts to matter.
In most AI tooling, memory is still treated like a convenience feature. A session ends, context evaporates, and the next run has to reconstruct the same decisions, assumptions, architecture tradeoffs, and operating knowledge from scratch. That may be tolerable for hobby projects. It is not good enough for professional users building production systems.
The first version of the problem looked simple. Keep a large context file, inject it into the prompt, and move on. That works until the context becomes expensive, stale, opaque, and impossible to selectively trust. When you compress it, you no longer know exactly what was removed. When multiple agents need it, the whole approach collapses into duplication and drift.
The second version of the problem was operational. Real agentic development means multiple models, multiple providers, failure modes, cost pressure, and different classes of tasks. Memory that cannot survive provider failure, cannot be shared across agents, or cannot explain its own transformations is not really infrastructure.
MNEMOS was built to solve those problems in a way that reflects real platform experience: provenance matters, compression should be inspectable, shared systems need access controls, and memory should behave like a service you can operate, not a feature you hope keeps working.
Its design is informed by years of enterprise platform work, large-vendor systems thinking, open-source infrastructure experience, and current work in the AI industry, without assuming that professional users want marketing language where they really need operational clarity.
MNEMOS has been in daily production use since December 2025, backing multiple active agentic systems simultaneously. By early 2026 the running install was holding thousands of memories and had performed thousands of compressions, each with a written quality manifest. The v3.0 release line unified that production codebase into the single-service FastAPI shape; v4.1.1 is the current shipped GA line, adding the coherent package layout, persistence abstraction, SQLite profiles, single-binary artifacts, and multi-worker production support. See CHANGELOG.md for the release history.
For the longer story — the original catalyzing moment, the architectural decisions (and mistakes) that took MNEMOS from a single-file prototype to a unified runtime, and the scrubs, refactors, and release-gate audits that landed the public cut — see EVOLUTION.md. Written for future contributors as much as for future readers who want to know what they're inheriting.
MNEMOS is built for the teams and operators who have already outgrown the prototype memory layer.
You probably want MNEMOS if:
- You run multiple agents, or multiple LLM providers, and they need to share a consistent memory pool that survives process restarts and provider outages.
- Your agents produce outputs someone downstream has to trust — an auditor, a regulator, a customer, a compliance team, yourself in six months.
- You care whether your memory layer can corrupt, silently swallow writes, or quietly truncate things you wanted to keep.
- You need real auth (API keys and OAuth/OIDC) and real multi-tenant isolation, not a bearer-token sticker over a single-user SQLite file.
- You have regulatory pressure around reasoning traceability — EMIR Article 57, SOC-2 evidence, GDPR right-to-explanation, or internal model-governance review boards.
- You need a memory substrate that survives schema migrations, provider circuit-breakers, and federation failures without hand-holding.
Who this is actually serving, concretely:
- Agentic-tooling teams running multi-agent stacks (crews, swarms, orchestrators) that keep losing shared context at the process boundary.
- Platform teams inside larger orgs wiring LLM routing + memory into an internal developer platform and needing a substrate they can operate, not babysit.
- Regulated-industry AI teams (finance, healthcare, legal, public sector) that need a cryptographic audit trail on every reasoning step and cannot ship without one.
- Research labs exploring consensus-reasoning, long-horizon agent memory, and memory-poisoning defenses — MNEMOS ships DAG versioning and an anti-poisoning guide precisely because those problems are real.
- Founders who've already hit the ceiling of in-process memory libraries and need something that survives process restarts, schema changes, and multi-agent concurrency.
- The 56-year-old former IBM / Microsoft veteran who has been thoroughly indoctrinated into architectural thinking and mission-critical design, and physically cringes at a memory layer that doesn't have the "-isms" and "-itabilities" thought through — atomicity, idempotency, referential integrity, ACIDism on the write path; durability, recoverability, observability, auditability, testability on the operate path; the things your old DBA would have red-pen'd in a review twenty years ago and your old SRE would red-pen now. This is for you. We know.
You probably don't need MNEMOS if:
- You are building a single-user chatbot for personal note-taking and raw similarity search over ChromaDB is fine.
- You only need short-term conversation history within a single session and your SDK already handles that.
- You don't care whether compressed context is faithful to the original — the "toy" solutions are honest about not providing that guarantee.
- The phrases audit trail, tamper evidence, multi-tenant isolation, and compression manifest don't mean anything to your use case and never will.
If you're in the first list, MNEMOS is designed specifically for you. If you're in the second list, something lighter will serve you better — use Mem0, Zep, or in-process summary buffers. They exist because those use cases are real. MNEMOS is the answer to a different question.
The field of agent memory systems is crowded and getting more so. Here is the honest case for MNEMOS.
Most memory systems answer one question badly: "What did this agent say before?"
That is conversation history. It is useful, but it is not memory infrastructure. Conversation history dies when the session ends, scales to one agent, and tells you nothing about whether the information it contains is still accurate, still complete, or safe to rely on.
MNEMOS answers a different set of questions:
- When I compressed that memory to fit in a context window, what did I throw away — and was it safe to throw away?
- If three of my LLM providers go down, does my reasoning layer fail or degrade gracefully?
- Can multiple agents share one memory pool without rebuilding context from scratch each session?
If none of those questions matter for your use case, a simpler tool is probably the right choice. If they do matter, read on.
| System | What it does | What it cannot tell you |
|---|---|---|
| MemGPT / Letta | Hierarchical paging within a single agent session | What was lost in compression; what happens when the LLM provider fails |
| Mem0 | Store and retrieve memories via API | Compression quality; reasoning consensus |
| Zep | Conversation history + entity extraction | Compression manifests; multi-provider reasoning |
| LangChain / LlamaIndex memory | In-process buffer or summary | Anything after the process exits |
| MemPalace | Desktop-library long-horizon memory with spatial retrieval and AAAK compression; single-user, in-process | Multi-process deployment; multi-user isolation; network-service semantics |
| CrewAI / AutoGen memory | Per-crew or per-agent embedded memory | Cross-session persistence; compression quality |
Quality contracts on compression. When MNEMOS compresses a memory, it produces a manifest: what was removed, what was preserved, the quality rating, and which use cases the compressed version is and is not safe for. No other memory system treats compression as something that requires a receipt. The compression pipeline runs in the background distillation worker through the plugin CompressionEngine ABC, a competitive per-memory contest, and a persisted audit log recording every winner and loser with its score and disqualification reason.
A reasoning layer that degrades gracefully. GRAEAE distributes queries across multiple LLM providers simultaneously, scores responses on relevance, coherence, completeness, and toxicity, and returns the best result. Per-provider circuit breakers prevent a failing provider from degrading the pool. A semantic cache means identical questions skip inference entirely. This is not a load balancer — it is a quality-gated reasoning bus.
A knowledge graph alongside free-text memory. MNEMOS stores structured triples (subject → predicate → object) with temporal validity windows alongside unstructured memories, and exposes a timeline API per subject. Most memory systems are text-only.
MemPalace, created by Mila Jovanovic, has pushed long-horizon agent memory into the ecosystem in a way few other projects have. The LongMemEval benchmark attention, the AAAK abbreviation research, and the Palace spatial-memory metaphor are real contributions to a problem — keeping agent memory useful across long time horizons without context explosion — that is genuinely unsolved and genuinely hard. It's work worth taking seriously, and MNEMOS has been influenced by several of its ideas.
In particular, MNEMOS shares MemPalace's bets that:
- Memory deserves first-class treatment as a data structure, not as a side-effect of conversation history.
- Compression is a design axis, not an afterthought: if you keep everything raw you lose the context-window fight, and if you compress naively you lose fidelity.
- Long-horizon memory needs structure. Whether you call it a "palace", a DAG, or a temporal knowledge graph, the point is that flat vector similarity runs out of answers fast.
MNEMOS is not trying to replace MemPalace. The two projects are solving adjacent problems with different shapes, for different users:
| MemPalace | MNEMOS | |
|---|---|---|
| Form factor | Desktop library, embedded in-process | Network service (FastAPI on port 5002), runs as a daemon |
| Deployment | pip install, runs inside your agent |
Deployed alongside your stack the way you'd run PostgreSQL or Redis; many agents and processes connect over REST |
| Storage | ChromaDB (SQLite-backed vector store) | Postgres + pgvector for server deployments, or SQLite + sqlite-vec for edge/dev profiles |
| Primary user | Individual developer on a single machine | Teams / platforms operating shared infrastructure |
| Concurrency model | Single-process, single-user | Multi-tenant with per-owner isolation, multi-process clients |
| Audit surface | Local logs | SHA-256 hash-chained audit chain, tamper-evident and externally reviewable |
| Reasoning | Storage + retrieval | GRAEAE multi-LLM consensus with quality scoring and provider failover |
If you are one developer building a personal agent that runs on your laptop and you want it to work offline with no infrastructure overhead, MemPalace is designed exactly for that and is a legitimate, well-constructed choice.
If you are a team or a platform deploying shared agent memory that multiple processes need to access concurrently, with an audit trail that stands up to external review, a DB backend that survives crashes and schema migrations, and a reasoning layer you can point a regulator or auditor at, MNEMOS is designed for that.
The shared premise — that agent memory deserves first-class treatment — is the same. The deployment target is not. Please don't read this section as a takedown; it's a map.
This is the current state of the v4.1.1 release line. Features described here are implemented unless explicitly called out as forward-looking in ROADMAP.md.
The API surface is namespaced under /v1/*.
| Endpoint | What it does |
|---|---|
POST /v1/memories |
Store a memory with category, subcategory, content, and optional provenance |
GET /v1/memories |
List memories, filterable by category and subcategory |
GET /v1/memories/{id} |
Retrieve a single memory |
POST /v1/memories/search |
Full-text or semantic search with category/score filters |
POST /v1/memories/bulk |
Bulk create memories |
PATCH /v1/memories/{id} |
Update memory content or metadata |
DELETE /v1/memories/{id} |
Delete a memory |
POST /v1/memories/rehydrate |
Token-budgeted context load for prompt injection; uses the compression contest's winning variant when present |
POST /ingest/session |
Ingest a session transcript |
GET /v1/memories/{id}/log |
DAG commit history for a memory |
POST /v1/memories/{id}/branch |
Create a branch from a specific commit |
POST /v1/memories/{id}/merge |
Merge a branch back to main |
GET /v1/memories/{id}/versions |
Version history |
GET /v1/memories/{id}/compression-manifests |
v3.1 contest audit: current winning variant + every historical contest's candidates with scoring fields and reject reasons. ?include_content=true for full content, default is a 200-char preview |
GET /health |
Health check (not namespaced) |
GET /stats |
Memory counts by category, compression statistics |
Read access for GET /v1/memories, GET /v1/memories/{id}, search, rehydrate, and gateway context now flows through the same application predicate: owner, federated, world-readable, or Unix group-readable (mnemos/core/visibility.py). Writes remain owner-scoped; being able to read a world/group/federated row does not grant update or delete rights.
Each memory carries full ownership and LLM provenance. Since v3.2, tenancy is two-dimensional: owner_id names the account and namespace names the caller's logical workspace. Non-root writes require both to match the caller context; root can override for import, repair, and audit work.
owner_id— which user owns this memorygroup_id— optional group for shared accessnamespace— logical partition (e.g.myapp/analyst)permission_mode— UNIX-style octal (600 = owner only, 640 = group readable, 644 = world readable)source_model— the LLM model that produced this memorysource_provider— the provider (openai, groq, ollama, etc.)source_session— session ID at time of creationsource_agent— agent name or identifier
Row Level Security is defined in PostgreSQL and remains opt-in for multi-user server deployments. The application layer mirrors the RLS read contract so SQLite/edge installs and RLS-backed server installs behave consistently. The v3.5 hardening work closed task #25 with db/migrations_v3_5_rls_group_select_unix_bits.sql: read_visibility_predicate and the mnemos_group_select RLS policy now use identical Unix group-bit math, ((permission_mode / 10) % 10) >= 4.
Deployment topologies — selected with mnemos install --profile ...,
mnemos serve --profile ..., or MNEMOS_PROFILE:
| Profile | Backend | Shared state | Use case |
|---|---|---|---|
server |
PostgreSQL | Redis | Production service, shared agents, multi-worker capable |
edge |
SQLite | In-process | Laptop, Pi/edge appliance, Consumer MNEMOS, Termux on S21 |
dev |
SQLite | In-process | Local development with debug logging |
personal is the legacy v3.x profile name and now resolves to edge.
Search hits update recall telemetry in the background: recall_count increments and last_recalled_at is set for the returned memory IDs. The counters are observability and future archival input, not authorization state; failures are logged and do not block the user-visible search response.
| Endpoint | What it does |
|---|---|
POST /admin/users |
Create a user |
GET /admin/users |
List all users |
POST /admin/users/{id}/apikeys |
Generate an API key (raw key returned once) |
GET /admin/users/{id}/apikeys |
List API keys for a user |
DELETE /admin/apikeys/{id} |
Revoke an API key (soft-delete) |
POST /admin/compression/enqueue |
v3.1: enqueue specific memories into memory_compression_queue for the contest path. Body: {memory_ids, reason, scoring_profile, priority}. Silently skips unknown IDs |
POST /admin/compression/enqueue-all |
v3.1: bulk-enqueue up to limit (default 500, max 10,000) memories. only_uncompressed=true (default) skips memories that already have a variant; set false to re-contest under new rules |
All admin endpoints require root role. On personal installs (no auth), they are accessible without a key.
| Endpoint | What it does |
|---|---|
POST /v1/kg/triples |
Create a subject → predicate → object triple |
GET /v1/kg/triples |
List triples with filters |
GET /v1/kg/timeline/{subject} |
All triples for a subject in temporal order |
PATCH /v1/kg/triples/{id} |
Update a triple |
DELETE /v1/kg/triples/{id} |
Delete a triple |
Multi-LLM consensus reasoning with cited memory artifacts and cryptographic audit chain.
| Endpoint | What it does |
|---|---|
POST /v1/consultations |
Create a consultation (prompt + task_type) |
GET /v1/consultations/{id} |
Retrieve a consultation record |
GET /v1/consultations/{id}/artifacts |
Cited memories used to answer |
GET /v1/consultations/audit |
Hash-chained audit log, owner-scoped for non-root callers |
GET /v1/consultations/audit/verify |
Verify audit chain integrity; root verifies globally, non-root verifies their own consultation rows |
Unified provider catalog with health tracking and task-aware recommendation.
| Endpoint | What it does |
|---|---|
GET /v1/providers |
List all configured providers with metadata |
GET /v1/providers/health |
Per-provider availability + circuit-breaker state |
GET /v1/providers/recommend |
Recommend a model for a task-type + budget |
Drop-in replacement for the OpenAI Chat Completions API — so any SDK that speaks OpenAI can speak to MNEMOS.
| Endpoint | What it does |
|---|---|
GET /v1/models |
List registry-backed models only |
GET /v1/models/{model_id} |
Registry model details; unregistered IDs return 404 |
POST /v1/chat/completions |
Chat completion; routes to the appropriate provider; optional memory injection; supports generation controls, SSE streaming, tools/tool_choice and response_format where provider-supported |
Gateway field support is pass-or-reject: OpenAI-style providers receive temperature, max_tokens, top_p, stop, n, presence/frequency penalties, and response_format; OpenAI and Anthropic receive tool schemas where supported; Gemini maps generation controls and JSON response format to its native fields. Providers that cannot honor a requested tool call, response format, penalty, or multimodal content block return a clear HTTP 400 instead of silently dropping it.
Multi-turn conversation state with memory injection at turn boundaries. Sessions carry accumulated context across requests and are scoped by the same owner+namespace pair as memories. v3.5 removed the legacy per-session compression columns; session history is ordered by deterministic pinned system rows plus recent turns, and compression output comes from memory_compressed_variants when a read path explicitly asks for compressed memory.
| Endpoint | What it does |
|---|---|
POST /v1/sessions |
Start a new session |
GET /v1/sessions/{id} |
Retrieve session state |
POST /v1/sessions/{id}/messages |
Post a turn; memory injection at turn boundary |
GET /v1/sessions/{id}/history |
Full message history |
DELETE /v1/sessions/{id} |
End a session |
MORPHEUS is the append-only dream-state subsystem: it scans a time window of existing memories, clusters related records, synthesises per-cluster summary memories, tags every generated memory with morpheus_run_id, and records run state in morpheus_runs. It is operator-triggered, not an automatic mutation daemon; rollback deletes memories generated by a run and marks the run rolled_back.
| Endpoint | What it does |
|---|---|
GET /v1/morpheus/runs |
List runs newest-first |
GET /v1/morpheus/runs/{id} |
Inspect one run |
GET /v1/morpheus/runs/{id}/clusters |
Inspect cluster membership and synthesized memory IDs |
POST /admin/morpheus/runs |
Trigger a run (root only, synchronous in this release) |
DELETE /admin/morpheus/runs/{id} |
Roll back a run (root only) |
Outbound notifications when events happen. Receivers verify an HMAC-SHA256 signature to trust the payload.
| Endpoint | What it does |
|---|---|
POST /v1/webhooks |
Subscribe; secret returned once |
GET /v1/webhooks |
List the caller's subscriptions |
GET /v1/webhooks/{id} |
Retrieve a subscription |
DELETE /v1/webhooks/{id} |
Revoke (soft-delete) |
GET /v1/webhooks/{id}/deliveries |
Recent delivery attempts |
Events: memory.created, memory.updated, memory.deleted, consultation.completed. Delivery is durable: every attempt is logged to webhook_deliveries, retried 4 times with exponential backoff (1m / 5m / 30m / 2h), and replayed from disk on restart by the delivery recovery worker. Recovery claims due current-writer rows (writer_revision=1) with lease_token / lease_expires_at in the dequeue transaction before scheduling lifecycle-tracked delivery attempt tasks, and sizes each claim batch to the send semaphore's current free capacity. A live partial unique index on (subscription_id, event_type, payload_hash, attempt_num) prevents duplicate pending/retrying successor attempts for the same chain, and a terminal partial unique index on (subscription_id, event_type, payload_hash) enforces exactly one status='succeeded' row per chain. Workers release the database connection before DNS validation or HTTP POST, and use the lease to bound whether they are allowed to send. Failure terminal updates require an unexpired live lease; 2xx finalization after response headers requires the matching lease token, a still-live row, and no succeeded chain peer. Success finalization writes status='succeeded' and abandons any free live successors as one chain-locked database transaction. Recovery-preclaimed sends re-check for live successors under the pre-POST chain lock and abandon/supersede the older attempt before any outbound POST. Active sends are capped per process with WEBHOOK_MAX_CONCURRENT_SENDS (default 64); lease duration is tunable with WEBHOOK_LEASE_SECONDS (default minimum 90s). The claim UPDATE uses PostgreSQL clock_timestamp() for both lease_expires_at and the returned claim_db_now, so advisory-lock waits cannot backdate the lease window. The dispatcher subtracts elapsed time since the pre-claim monotonic anchor plus WEBHOOK_FINALIZE_BUFFER_SECONDS (default 5s) before starting DNS validation or HTTP POST, and releases the lease without consuming a retry if the remaining send window is gone. A separate repair worker runs the burst-then-periodic sweep independently of delivery sends, skips rows with an unexpired lease, and idempotently normalizes out-of-order pending/retrying overwrites of already superseded attempts whenever a newer successor exists or any chain peer has succeeded. Graceful shutdown cancels perpetual webhook workers first, then waits up to WEBHOOK_SHUTDOWN_DRAIN_SECONDS for in-flight delivery attempts to finish finalization before last-resort cancellation. Webhook POST acknowledgement is determined by the HTTP status code as soon as response headers arrive: 2xx succeeds and non-2xx remains retryable regardless of response-body completeness. Response-body capture is best-effort audit data, bounded by WEBHOOK_RESPONSE_BODY_MAX_BYTES (default 2048) and the capture timeout.
Round 24 closure: db/migrations_v3_5_webhook_succeeded_terminal_trigger.sql makes status='succeeded' terminal at the database layer. Stale writers are structurally prevented from reverting an ACK back to pending or retrying, while response-body audit updates and lease clearing still work.
Round 20 closes the post-header ACK race by finalizing the status code before response-body capture or stream/client cleanup. The first terminal UPDATE stores response_status with response_body=NULL; post-finalize body capture fills response_body with a no-token audit UPDATE only if it completes within its own timeout, and cleanup remains post-finalize best effort. Round 21 narrowed the 2xx ACK durability window to the chain advisory lock plus the committed success UPDATE. Round 22 moved free-successor cleanup back into that same success transaction. Round 23 drops the per-successor savepoints and removes the post-commit fallback, so the success commit and all free-successor status='abandoned' updates are truly atomic. Cleanup failure or cancellation rolls back the whole success transaction and retries from the lease-owned state; in the rare failure path, that can resend one ACKed POST up to the retry limit, with warnings logged for observability. Recovery-preclaimed attempts also re-check for live successors under the pre-POST chain lock and abandon the older attempt instead of sending a duplicate.
Signature header: X-MNEMOS-Signature: sha256=<hex>. Verify with hmac.new(secret, body, sha256).hexdigest().
Browser-based login via external identity providers. Coexists with API-key auth — the same user can have both a key and an OIDC identity.
| Endpoint | What it does |
|---|---|
GET /auth/oauth/providers |
List enabled providers (public, no secrets) |
GET /auth/oauth/{provider}/login |
Start authorization-code + PKCE flow |
GET /auth/oauth/{provider}/callback |
Handle provider redirect; sets mnemos_session cookie |
POST /auth/oauth/logout |
Revoke session (optionally ?all_devices=true) |
GET /auth/oauth/me |
Who am I (works with either auth method) |
Admin side (/admin/oauth/* — root only):
| Endpoint | What it does |
|---|---|
POST /admin/oauth/providers |
Register a provider (Google, GitHub, Azure AD, or generic OIDC) |
GET /admin/oauth/providers |
List configured providers (client_secret redacted) |
PATCH /admin/oauth/providers/{name} |
Update provider config |
DELETE /admin/oauth/providers/{name} |
Remove a provider |
GET /admin/oauth/identities |
List all OAuth identities (optionally filter by user) |
Sessions are DB-backed, revocable, and expire after 30 days by default. User provisioning: same external-id reuses the user; matching email links to an existing user; otherwise a fresh user is created.
For single-site HA, use PostgreSQL streaming replication: one writable primary,
read-only standbys, and a stable writer endpoint for MNEMOS. Federation remains
first-class, but it is for genuinely remote data flows: multi-site deployments,
multi-org curated feeds, and developer laptop or edge replicas with
intermittent connectivity using the v4 SQLite-backed edge profile.
Do not use federation between same-LAN MNEMOS nodes for HA; it creates
application-level dedup work that PostgreSQL WAL streaming already solves below
the app. See DEPLOYMENT.md
and docs/STREAMING_REPLICATION.md.
MNEMOS Portability Format (MPF) is the native envelope for moving memories between MNEMOS instances and across compatible external systems. GET /v1/export writes an MPF v0.1.x envelope; POST /v1/import reads the envelope back. Root callers may use preserve_owner=true for authoritative restore and migration work; non-root imports are scoped to the caller's owner+namespace.
CLI helpers live in mnemos/tools/memory_export.py, mnemos/tools/memory_import.py, and mnemos/tools/mpf_validate.py. The schema is documented in docs/MEMORY_EXPORT_FORMAT.md, and v3.4.1 added CHARON schema-compat preflight so federating peers can refuse incompatible migration sets before sync.
Pull-based one-way federation between genuinely remote MNEMOS instances. Remote peer exposes /v1/federation/feed; local instance pulls on a configurable interval, storing remote memories with ids of the form fed:{peer_name}:{remote_id} and federation_source = peer_name. Federated memories are read-only by application convention.
| Endpoint | What it does |
|---|---|
POST /v1/federation/peers |
Register a remote peer (root only) |
GET /v1/federation/peers |
List registered peers |
GET /v1/federation/peers/{id} |
Peer detail |
PATCH /v1/federation/peers/{id} |
Update (enable/disable, filters, interval) |
DELETE /v1/federation/peers/{id} |
Unregister |
POST /v1/federation/peers/{id}/sync |
Manual sync trigger (blocks on completion) |
GET /v1/federation/peers/{id}/log |
Sync history for a peer |
GET /v1/federation/status |
Aggregate status across all peers |
GET /v1/federation/feed |
Serve memories to remote peers (role=federation or root) |
Trust model: mutual — each side registers the other. Side A issues Side B a Bearer token by creating a MNEMOS user with role='federation' and minting an API key via the admin API. Side B stores that token in its own federation_peers.auth_token. Side A's feed endpoint validates the token and role IN ('federation', 'root').
Dedup: re-pulls are safe. Local id fed:{peer}:{remote_id} is stable; only rows with a newer federation_remote_updated overwrite existing ones.
Cursoring: /v1/federation/feed uses an opaque compound cursor over (updated, id) and orders by the same pair, so pagination cannot skip memories when many rows share one updated timestamp. Malformed or missing cursors start an initial fetch from the beginning.
Filters: namespace_filter and category_filter (both arrays) restrict what gets pulled from a peer; NULL = pull everything the peer will serve.
Loop prevention: the feed endpoint excludes memories where federation_source IS NOT NULL, so federated memories don't propagate hop-by-hop through a chain of peers.
The reasoning engine behind /v1/consultations provides:
- Circuit breaker — per-provider CLOSED/OPEN/HALF_OPEN state machine, 5-minute cooldown
- Consensus response cache — per-process LRU keyed on
sha256(task_type + normalized_prompt), 1-hour TTL, 500-entry cap. Exact-match dedup, not embedding similarity (embedding round-trip would negate the win for the less-common near-duplicate case). - Quality scorer — success / failure / latency tracking per provider, combined with Arena.ai Elo weights from the model registry (see next section) for dynamic consensus weighting
- Rate limiter — single-level request rate limit with graceful backoff
- Audit chain — SHA-256 hash-chained prompt/response log for compliance
Most multi-LLM routers hardcode a provider list. MNEMOS ships a self-populating persistence-backed model registry that keeps itself current when the sync job is installed.
What's in the registry. Every known model from every configured provider — OpenAI, Groq, xAI, Together, Nvidia, Gemini, Anthropic — with per-model metadata: provider + model_id, display name, family (grok-4, gpt-5, gemini-3, …), capabilities (chat, vision, code, reasoning, web_search), context window, max output tokens, input / output / cache pricing (USD per million tokens), availability, deprecation flag, Arena.ai Elo score, Arena.ai rank, and the normalized graeae_weight (0.50–1.00) actually used by the consensus scorer.
How it populates.
- Daily provider sync —
scripts/sync_provider_models.py(systemd:graeae-model-sync.timer) callsgraeae.provider_sync, hits each provider's model-list endpoint where one exists (Gemini uses/v1beta/models; Anthropic uses a static list because Anthropic does not expose a public/modelssurface), and upserts intomodel_registry. New models appear automatically when the timer is installed and provider keys are configured; deprecated ones get flagged. - Arena.ai ranking sync — the same sync script refreshes Arena.ai/LMArena ranking data unless run with
--skip-arena, maps ranks back to providers + models, and writesarena_score,arena_rank, andgraeae_weightinto the registry.scripts/refresh_elo_weights.pyremains available for an Elo-cache-only refresh. - Online quality signals — the GRAEAE engine tracks per-provider success / failure / latency in memory; those signals combine with the registry's Elo-derived
graeae_weightto pick the winning response on each consultation.
What it's used for.
/v1/providers/recommend?task_type=...&budget=...— returns the cheapest available model that meets the task's capability + quality floor. Usesgraeae_weightas the quality signal andinput_cost_per_mtok + output_cost_per_mtokas the cost signal.- GRAEAE consensus scoring — provider responses are weighted by
graeae_weightbefore the consensus pick. A provider that drops on Arena.ai also drops in MNEMOS's internal routing after the sync timer updates the registry, without code changes. - OpenAI-compatible gateway model routing — when a caller passes
model="auto",model="best-coding",model="best-reasoning",model="fastest",model="cheapest", the gateway resolves against the registry rather than a hardcoded alias table. - OpenAI-compatible model discovery —
/v1/modelsand/v1/models/{model_id}expose registry rows only; chat routing can still fall back to provider-name heuristics for explicit requests, but discovery does not invent synthetic models.
Fresh-install behavior. If the registry is empty (first boot, no sync run yet), /v1/providers/recommend falls back to the static GRAEAE provider config so new deployments don't 404. The first provider_sync run typically populates 30–50 models depending on which provider API keys you've configured.
A lot of the v3.x surface is held up by background work that doesn't show up in the route table but does show up in the failure modes it prevents. For anyone who wants to know what's there:
- Webhook retry repair and delivery recovery workers — the repair worker runs its startup burst and periodic sweep as its own lifespan task, marking lease-free
pendingorretryingrows that already have a successor or any other succeeded chain peer asabandonedwithsuperseded=TRUE, even when an out-of-order writer has overwritten an already superseded row's status. The delivery worker separately claims due current-writerpendingrows plus dueretryingrows only when no successor attempt exists, then schedules tracked delivery-attempt tasks instead of owning the HTTP send itself. Recovery writes the short persisted lease in the dequeue transaction, caps each claim batch to currently free send slots, and releases the DB connection before DNS/HTTP; the outbound send runs under a wall-clock deadline derived from that lease so semaphore slots and external POSTs cannot outlive the lease, and a pre-send lease-window miss releases the lease without burning a retry. A 2xx response header is the external ACK, so success finalization can record it with matching token ownership even if the lease expires before finalize commit, provided free-successor cleanup commits in the same transaction. Non-2xx failure paths still require an unexpired lease. Per-chain advisory locks serialize direct claims, preclaimed-send guards, successor inserts, and succeeded-chain convergence checks; preclaimed-send guards also reject older attempts when a live successor appears after recovery claim but before POST. A live partial unique index prevents duplicate successor rows, and a terminal partial unique index enforces one succeeded row per chain. - Distillation worker supervision — the compression worker runs under an exponential-backoff supervisor (1s → 2s → 4s → … capped at 5 min). A crash is logged and retried; the worker does not silently die and leave memories un-compressed for the rest of the process lifetime.
- Search stays out of the compression control plane — large
/v1/memories/searchresult sets keep the standard raw response shape and no longer probe the retired legacy distillation backend. Compression health and output come from the queue-driven APOLLO/ARTEMIS worker,memory_compression_queue,memory_compressed_variants, and the APOLLO GPU guard path. - OAuth session garbage collector — hourly sweep of expired and long-revoked sessions. Bounds the
oauth_sessionstable so a long-running install doesn't accumulate dead rows forever. - Federation sync worker — for genuinely remote peers, iterates enabled peers on their individual sync intervals, pulls batches, reconciles local + remote timestamps before overwriting, logs per-sync results to
federation_sync_log. Single-site HA uses PostgreSQL streaming replication instead. - Advisory-lock-serialized audit chain writer — the hash chain writer takes
pg_advisory_xact_lockbefore reading the chain tip, so concurrent consultations cannot compute against the same stale previous hash. Closes a TOCTOU window in tamper-evident logging that most implementations leave open. - Advisory-lock-serialized DAG writers — merges and feature-branch reverts share
_branch_advisory_lock_keyinmnemos/api/routes/dag.py, then take row locks in the same order. Concurrent writers on the same(memory_id, branch)serialize instead of orphaning branch heads. - Trigger-level DAG parent guard —
db/migrations_v3_5_trigger_same_memory_parent.sqlreplacesmnemos_version_snapshot()so UPDATE/DELETE resolve branch HEADs under lock, reject missing/NULL/foreign heads with SQLSTATEMN001, and keep delete snapshots live for deployments that still attach the delete trigger. - ASGI body-size middleware — native ASGI (not
BaseHTTPMiddleware), so it rejects chunked uploads whose running byte count exceedsMAX_BODY_BYTESas they arrive, before the full body lands in memory. Content-Length–declared uploads are rejected before the app is even invoked. - SSRF-hardened webhook dispatch — URLs are re-validated at send time (not just at subscription time); DNS resolves asynchronously so a slow resolver can't freeze the event loop; cloud metadata hostnames (AWS IMDS, Google
metadata.google.internal, Tencent, Alibaba, IPv6 variants) are on a deny list alongside the RFC1918 / loopback / link-local filter. - Rate limiter with X-Forwarded-For trust — default keys on direct socket peer (safe behind no proxy); set
RATE_LIMIT_TRUST_PROXY=trueonly when you run behind a proxy you control. Prevents clients from blowing out the global limit via spoofed headers. - pgvector query sanitization — embedding vectors returned by the embedder are
float()-cast before being stringified into the query. A poisoned embedder cannot inject SQL via a non-numeric vector "component". - Full-text search operator filtering —
/v1/memories/searchusesplainto_tsqueryrather thanto_tsquery, so|,&,!and friends get treated as literal text instead of tsquery operators. User input cannot construct adversarial FTS queries. - Federation size caps — an abusive peer cannot fill your disk: pulled content capped at 1 MB per memory, metadata at 64 KB, name fields at 256 chars.
- Rate-limited audit endpoints —
/v1/consultations/audit/verifywalks the entire chain from genesis for root callers and verifies only the caller's consultation audit rows for non-root callers; capped at 5/min so an authenticated caller cannot force O(N) scans on a large log./auditlist is owner-scoped for non-root callers and capped at 30/min. - Quality manifest on every compression — every compression engine in the active stack writes a receipt:
{what_was_removed, what_was_preserved, quality_rating, risk_factors, safe_for, not_safe_for}. Compression-as-data, not compression-as-side-effect.
Every cross-table reference in the schema is a real PostgreSQL foreign key with an explicit ON DELETE semantic — not a loose string column you have to trust the application layer to honour. Twenty-two FK edges across the system, and every one carries a deliberate decision about what happens when the thing it points at goes away. The schema has opinions.
Two patterns, picked per edge:
ON DELETE CASCADE — when lifecycle is genuinely owned:
api_keys.user_id → users(id)— delete a user, their keys go with them.sessions.user_id → users(id),session_messages.session_id → sessions(id)— close a session, its messages go.user_groups.user_id → users(id),user_groups.group_id → groups(id)— membership is owned by both endpoints.webhook_subscriptions.owner_id → users(id),webhook_deliveries.subscription_id → webhook_subscriptions(id)— subscriber deletion collapses the whole delivery subtree (soft-delete viarevoked=trueis the normal path; CASCADE only matters on hard deletes).oauth_identities.user_id → users(id),oauth_identities.provider → oauth_providers(name),oauth_sessions.user_id → users(id)— OAuth bindings follow their owner.federation_sync_log.peer_id → federation_peers(id)— unregister a peer, its sync history goes.graeae_audit_log.consultation_id → graeae_consultations(id),consultation_memory_refs.consultation_id → graeae_consultations(id)— audit rows are owned by the consultation they describe. Chain integrity comes from the SHA-256 hash chain, not from the FK, so deleting a consultation does not break the chain's verifiability.
ON DELETE SET NULL — when audit history has to survive the referenced row's deletion:
memory_versions.parent_version_id → memory_versions(id)— admin-path deletion of a mid-history commit leaves the DAG with a gap rather than cascading through every descendant.memory_branches.head_version_id → memory_versions(id)— same reasoning; branches get re-pointed, not destroyed.session_memory_injections.memory_id → memories(id)— if a memory is deleted later, the record that we once injected it into a session stays. The audit outlives the artifact.compression_quality_log.memory_id → memories(id)— the quality manifest survives the thing it was a manifest for. Compliance cares that the transformation happened, not that the output still exists.consultation_memory_refs.memory_id → memories(id)— a consultation's cited memory may be deleted; the record of the citation is an audit artifact and must not vanish.oauth_sessions.identity_id → oauth_identities(id)— rotating an identity doesn't invalidate a session row that was already in flight.
The FK graph prevents accidental loss. The application and trigger layer add the same-memory checks the FK alone cannot express: branch HEAD JOINs are scoped by memory_id, recursive logs only walk same-memory parents, and the v3.5 trigger raises MN001 instead of writing a cross-memory parent edge.
This is the part most projects that call themselves "memory" skip, because if the whole point is "store a blob, retrieve a blob", the relationships between blobs are out of scope. MNEMOS's design asserts the opposite: memories relate to consultations relate to audit entries relate to sessions relate to users, and the system has strong opinions about which of those relationships is load-bearing and which is historical.
The constraints are enforced at the database level. Application bugs cannot violate them. Migration bugs cannot silently create orphan rows. The constraint travels with the row.
Compression is operator-batched in v4.0. It is not an automatic session-column flag and it no longer uses the retired LETHE / ANAMNESIS / ALETHEIA engines. Operators enqueue work through the admin endpoints; the distillation worker runs a competitive contest over the active engines and persists the winner plus the losing candidates for audit.
- ARTEMIS — CPU-only extractive compression with identifier preservation, labeled-block handling, and evidence-based self-scoring.
- APOLLO — schema-aware dense encoding for LLM-to-LLM wire use. Rule-based schema detection with optional LLM fallback for fact-shaped content that misses a known schema.
- Quality manifest on every compression: what was removed, what was preserved, risk factors, safe/unsafe use cases.
- Original content always retained; compressed and original stored independently.
- Configurable quality thresholds per task type (security review: 95%, architecture: 90%, general: 80%).
- The plugin
CompressionEngineABC is open to operator-registered engines. The contest logs the winner plus every loser with its score and rejection reason. /v1/memories/searchstill carries reservedcompression_applied/compression_metadataresponse fields from the v3.2 API shape; v4.0 search responses setcompression_applied=false. Use/v1/memories/rehydrateor/v1/memories/{id}/compression-manifestswhen you need to know whether a compressed variant was used.
The mnemos/domain/memory_categorization package still exposes a hot/warm/cold/archive selector for hook-side prompt budgeting. It is advisory metadata used by integrations; it is not the removed sessions.compression_tier database column and does not drive automatic background compression.
- Memory version history (
memory_versionstable) — every mutation auto-snapshots previous state - Diff and revert API:
GET /v1/memories/{id}/versions,GET /v1/memories/{id}/versions/{n},GET /v1/memories/{id}/diff,POST /v1/memories/{id}/revert/{n}. Non-root callers see only snapshots whose ownowner_id/namespace/permission_modepassversion_visibility_predicate(mnemos/core/visibility.py). - DAG (git-like) versioning:
GET /v1/memories/{id}/log,POST /v1/memories/{id}/branch,POST /v1/memories/{id}/merge,GET /v1/memories/{id}/commits/{commit}. Logs do not bridge across invisible snapshots; a visible child whose immediate parent is hidden reportsparent_hash=null. - SHA-256 hash-chained audit log for consultations:
GET /v1/consultations/audit,GET /v1/consultations/audit/verify
Landed with the v3.0 release line:
- ✅ Webhook subscriptions — outbound notifications on memory write, consultation completion. HMAC-signed delivery, retry with exponential backoff.
- ✅ OAuth/OIDC authentication — browser-based login via Google, GitHub, Azure AD, or custom OIDC providers. Coexists with existing API-key auth.
- ✅ Cross-instance memory federation — pull-based peer sync with Bearer-authenticated peers. Federated memories stored locally with
federation_sourcemetadata,fed:{peer}:{remote_id}id prefix, and a background worker that respects per-peer sync intervals.
- ✅ Plugin
CompressionEngineABC — open extension point; operators register additional engines alongside the built-ins (APOLLO + ARTEMIS). - ✅ Competitive-selection compression contest — every eligible engine runs per memory; highest composite_score wins; every loser recorded with its reject_reason. Scoring profile is operator-configurable (
balanced|quality_first|speed_first|custom). - ✅ Persisted audit log — three new tables (
memory_compression_queue,memory_compression_candidates,memory_compressed_variants) with full history queryable viaGET /v1/memories/{id}/compression-manifests. - ✅ GPU circuit breaker — per-endpoint three-state breaker (CLOSED → OPEN → HALF_OPEN → CLOSED); gpu_required engines fast-fail during outages instead of piling requests onto a dead endpoint.
- ✅ Admin enqueue endpoints —
POST /admin/compression/enqueue(specific memory IDs) andPOST /admin/compression/enqueue-all(bulk with filters) for operators to drive the contest from the API layer. - ✅ Optional too-short content gate —
MNEMOS_CONTEST_MIN_CONTENT_LENGTHskips memories below a threshold before spending GPU time on content that can't be meaningfully compressed. - ✅ v2 versioning trigger bytea fix — the
mnemos_version_snapshot()trigger no longer crashes on memories containing backslash sequences (common in code, paths, regex, logs).
- ✅ CHARON federation schema preflight — peers exchange schema signatures before sync and return 409 on incompatible strict-mode pairings.
- ✅ Dev↔prod MPF restore drill —
docs/RESTORE-DRILL.mdis validated on the PYTHIA → PROTEUS path.
- ✅ Slice 1: audit quick wins (
a62a099) — session history returns the most recent messages first with deterministic system-row pinning, and project URLs now point atmnemos-os/mnemos. - ✅ Slice 2: memory-read tenancy + DAG integrity (
d42c475) — shared memory read visibility, per-snapshot history visibility, same-memory DAG guards, race-safe branch creation,MN001to HTTP 409 reconciliation guidance, and a composepostgres-upgradeservice for existing volumes. - ✅ Webhook retry state machine + leases + outbox discipline — persisted leases, one-success-per-chain guards, repair worker separation, bulk-create parity, and terminal success trigger.
- ✅ MCP unified registry — stdio and HTTP/SSE expose the same 18 tools from
mnemos/mcp/tools/, including CRUD, KG, DAG, bulk create, stats, and model recommendation. - ✅ Faithful OpenAI-compatible gateway — propagated generation controls, OpenAI-format SSE, registry-honest model discovery, and explicit 400/404 responses when the selected provider cannot honor a requested feature.
- ✅ Namespace-uniform tenancy — state, journal, entities, sessions, consultations, webhooks, and memory read/history paths use the owner+namespace discipline.
- ✅ PostgreSQL streaming-replication doctrine — single-site HA uses Postgres primary/standby replication; MNEMOS federation is for remote or curated data flows.
- ✅ Compression cleanup — live compression is APOLLO + ARTEMIS through the contest worker; retired compatibility shims and vestigial session compression columns are gone.
v3.5.1 is a documentation-triage patch shipped on 2026-04-28. It bumps package/runtime version metadata to 3.5.1 and reconciles release-state docs with the v3.5.0 GA tag; it does not change product behavior from v3.5.0.
- ✅ Coherent package layout — production code now lives under
mnemos/withapi/routes,core,db,domain,persistence,mcp,webhooks,workers,hooks,installer,tools, andclisubpackages. - ✅ Persistence abstraction —
PersistenceBackendowns the contract;PostgresBackenduses asyncpg + pgvector + RLS + LISTEN/NOTIFY, andSqliteBackenduses aiosqlite + sqlite-vec + FTS5 + JSON1 + WAL. - ✅ Deployment profiles —
server,edge, anddevselect safe defaults throughMNEMOS_PROFILEormnemos serve --profile. - ✅ Multi-worker support — Redis-backed circuit breaker, rate limiter, and concurrency limiter coordinate API workers; in-process fallback remains for single-worker dev and edge installs.
- ✅ Single-binary distribution — PyInstaller artifacts for linux-x86_64, linux-aarch64, and macos-aarch64 bundle sqlite-vec and the migration chain.
- ✅ Unified CLI —
mnemos serve / install / worker / export / import / consult / health / versionreplaces the old top-level Python entry points. - ✅ Architectural enforcement — seven import-linter contracts keep API, domain, db, core, persistence, MCP, and webhook boundaries honest in CI.
- ✅ GRAEAE mode validation — routing modes plus
single,debate, andmajorityare modeled as aLiteral; unknown modes 422 instead of falling through.
bulk_create_memoriesnow runs through the backend transaction and webhook outbox surface, so it works on SQLite-backed edge profiles as well as Postgres-backed server profiles.- The SQLite-backed
edgeprofile intentionally exposes a narrower HTTP API: sessions, entities, state, and MORPHEUS telemetry routes return 503 because those surfaces still depend on server-profile Postgres SQL. - MORPHEUS run and cluster endpoints are operator-only telemetry. They require root credentials because responses can include namespaces, configs, errors, and memory IDs across tenants.
- v4.1 still does not ship the separate web frontend, mobile clients, hosted MNEMOS Cloud, or Rust hot-path rewrites; those remain roadmap items.
Forward-looking scope is maintained in ROADMAP.md, which lists shipped v3.x/v4.1 scope and items explicitly deferred with rationale.
Near-term not-yet-scoped candidates:
- Web UX in the separate
mnemos-webfrontend repo - Mobile clients: Android Termux hardening first, iOS native later
- Rust rewrites for selected hot paths after the Python v4 architecture settles
- Hosted MNEMOS Cloud and foundation-tier OSS standardization work (MCP-MD via LF AI & Data) in the v5.0+ frame
Agents / MCP clients / OpenAI-compatible SDKs
│
│ REST, MCP stdio, MCP HTTP/SSE
▼
mnemos.api.routes -> mnemos.domain -> mnemos.db
│ │ │
│ ▼ ▼
│ GRAEAE / MOIRAI PersistenceBackend
│ ├─ PostgresBackend
│ └─ SqliteBackend
▼
mnemos.core lifecycle/config/visibility
│
├─ mnemos.webhooks
├─ mnemos.workers
├─ mnemos.mcp
└─ mnemos.cli
curl -L https://github.com/mnemos-os/mnemos/releases/download/v4.1.1/mnemos-linux-x86_64 -o mnemos
chmod +x mnemos
./mnemos install --profile edge
./mnemos serve --profile edgepip install mnemos-os==4.1.1
mnemos install --profile dev
mnemos serve --profile devgit clone https://github.com/mnemos-os/mnemos.git
cd mnemos
python -m pip install -e ".[dev,sqlite]"
mnemos install --profile dev
mnemos serve --profile devFor the server profile, create or point at a Postgres database, set
MNEMOS_PROFILE=server, and provide MNEMOS_DATABASE_URL / Postgres settings
plus a Redis-backed RATE_LIMIT_STORAGE_URI. For Docker installs with an
existing postgres_data volume, docker-compose.yml and
docker-compose.staging.yml still run a one-shot postgres-upgrade service
because /docker-entrypoint-initdb.d files only execute on fresh database
initialization.
mnemos serve --profile server
curl http://localhost:5002/healthMNEMOS runs one worker by default. Increase MNEMOS_WORKERS only with Redis
backing shared rate-limit, circuit-breaker, and concurrency state; see
docs/SCALING.md.
# Basic
curl -X POST http://localhost:5002/v1/memories \
-H 'Content-Type: application/json' \
-d '{"content": "...", "category": "decisions", "subcategory": "architecture"}'
# With provenance
curl -X POST http://localhost:5002/v1/memories \
-H 'Content-Type: application/json' \
-d '{
"content": "...",
"category": "decisions",
"namespace": "myagent/analyst",
"source_model": "gemma4-consult",
"source_agent": "background-enricher"
}'
# Team/enterprise: include API key
curl -X POST http://localhost:5002/v1/memories \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <your-api-key>' \
-d '{"content": "...", "category": "decisions"}'# Full-text search
curl -X POST http://localhost:5002/v1/memories/search \
-H 'Content-Type: application/json' \
-d '{"query": "topic keywords", "limit": 10}'
# Filtered by category
curl -X POST http://localhost:5002/v1/memories/search \
-H 'Content-Type: application/json' \
-d '{"query": "keywords", "category": "solutions", "limit": 5}'
# Semantic (vector) search
curl -X POST http://localhost:5002/v1/memories/search \
-H 'Content-Type: application/json' \
-d '{"query": "keywords", "semantic": true, "limit": 10}'# Create a user
curl -X POST http://localhost:5002/admin/users \
-H 'Content-Type: application/json' \
-d '{"id": "alice", "display_name": "Alice", "role": "user"}'
# Generate API key — raw_key shown once only
curl -X POST http://localhost:5002/admin/users/alice/apikeys \
-H 'Content-Type: application/json' \
-d '{"label": "cli-key"}'
# Revoke a key
curl -X DELETE http://localhost:5002/admin/apikeys/<key-id>curl -X POST http://localhost:5002/v1/consultations \
-H 'Content-Type: application/json' \
-d '{"prompt": "Your question", "task_type": "architecture_design"}'
# Extract best result by score
curl -X POST http://localhost:5002/v1/consultations \
-d '{"prompt": "...", "task_type": "reasoning"}' | \
jq '.all_responses | to_entries | sort_by(-.[1].final_score)[0]'| Category | Use for |
|---|---|
infrastructure |
Configs, endpoints, system state |
solutions |
Workarounds, resolved problems |
patterns |
Reusable approaches |
decisions |
Rationale and tradeoffs |
projects |
Per-project context |
standards |
Quality gates, conventions |
| Task type | Notes |
|---|---|
architecture_design |
Full consensus |
reasoning |
Full consensus |
code_generation |
Speed-optimized provider subset |
web_search |
Real-time capable providers |
{
"compression_id": "uuid",
"quality_rating": 92,
"what_was_removed": ["2 introductory sentences", "3 supporting examples"],
"what_was_preserved": ["Complete reasoning chain", "All main conclusions"],
"risk_factors": ["Missing examples may reduce convincingness"],
"safe_for": ["Initial consultation", "Quick decision making"],
"not_safe_for": ["Security-critical decisions", "Detailed technical review"]
}Why is it called MNEMOS, and why are you using all these mythological names — who do you think you are, a fantasy novelist?
Fair question, and we get it more than you'd think. Short answer: the names aren't set dressing. Each one is a functional tag that happens to line up with a real Greek concept, because memory is one of the domains where Greek already had the vocabulary we needed.
- MNEMOS — short for Mnemosyne, the Titan goddess of memory and mother of the Muses. The system stores and retrieves memory. "MemoryService" felt like underselling it.
- GRAEAE — the three sisters in the Perseus myth who shared one eye and one tooth, passing them back and forth to see and speak. GRAEAE is the multi-LLM consensus layer: several providers sharing one prompt and converging on one consolidated answer. The metaphor was already sitting there.
- THE MOIRAI — the three Fates, who spin, measure, and cut the thread of life. The compression stack is collectively THE MOIRAI because each tier decides what part of a memory's thread survives. The current built-ins are ARTEMIS for CPU extractive compression and APOLLO for schema-aware dense encoding.
No, we are not fantasy novelists. The naming scheme is what happens when the domain you're working in is literally the thing a pre-Socratic culture wrote whole theogonies about, and you decide to use their vocabulary instead of inventing a worse one. Every name is aligned to what the component does, not chosen for atmosphere.
If you strongly prefer MemoryService / LLMRouter / CompressorTier1, the code does exactly the same thing regardless of the label. They're just tags. We like ours.
No. CPU-only installs run fine. ARTEMIS runs on CPU, and the API server itself never needs a GPU. APOLLO's optional LLM fallback and local inference backends only kick in when GPU_PROVIDER_HOST is configured. For most deployments, CPU plus one external LLM provider is enough.
Yes. GRAEAE routes across any configured provider. Together AI and Groq are the default free-tier providers (no paid account required to get started). OpenAI, Anthropic, and Perplexity are supported as fallback providers. Local Ollama is first-class — MNEMOS can run fully offline with Ollama plus nomic-embed-text for embeddings.
Not today. Self-hosted only.
See the MemPalace and MNEMOS: different problems, not competitors section above, plus the comparison table. Short version: those are in-process libraries or conversation-history stores designed for single-user / single-agent deployment. MNEMOS is a network service with multi-tenant isolation, a cryptographic audit chain, and a DAG-versioned memory model. Different form factor, different primary user.
No. There is no outbound telemetry of any kind. The only outbound traffic is the LLM provider calls you configure yourself, the webhook deliveries you register, and the federation syncs you set up. The code is all here; grep httpx if you want to confirm.
Yes — we have been since December 2025. v3.0 is the first public release line, not a greenfield experiment; the codebase has been operated continuously for roughly four months before being cut for open source. The honest caveat: it has been single-operator-tested, not yet battle-tested across many independent deployments. File issues against the live install and we'll track them.
Currently manual — write a one-shot script that hits POST /v1/memories/bulk with your source data. Direct-import adapters for major competitors are on the roadmap but not yet shipped.
Historical. Earlier versions split MNEMOS (5000) and GRAEAE (5001) across two services; v3 unified them on 5002 to signal "this is the combined single service". Override with MNEMOS_PORT if 5002 is taken.
Yes. Dockerfile and docker-compose.yml ship in the repo; docker compose up -d gets you a working MNEMOS + PostgreSQL instance for local evaluation. For Kubernetes, the Docker image is the starting point — no Helm chart yet, but the service is stateless on its own (Postgres is the only state), so a standard Deployment + Service + ConfigMap pattern works.
- Set
MNEMOS_API_KEYand require Bearer auth on all requests. - Enable
RATE_LIMIT_ENABLED=true(it's on by default). - Set
MNEMOS_SESSION_SECRETto a stable value so OAuth flows survive restarts. - Set
OAUTH_TRUST_PROXY=true+RATE_LIMIT_TRUST_PROXY=trueonly when you're behind a reverse proxy you control. - Keep
WEBHOOK_ALLOW_PRIVATE_HOSTS=false(the default). SSRF defense is on by default. - Run behind a TLS-terminating reverse proxy. Don't expose the Uvicorn socket directly.
- Review
SECURITY.mdfor the full checklist.
MNEMOS is licensed under the Apache License, Version 2.0. See LICENSE for the full text.
Contributions are accepted under the Developer Certificate of Origin (DCO) — see CONTRIBUTING.md.
