hermes-agent-memory

LanceDB-backed memory provider plugin for Hermes Agent.

Embeds a workspace-scoped LanceDB table at ~/.hermes/lancedb/memories.lance and exposes four tools to the agent: lancedb_recall, lancedb_remember, lancedb_read, lancedb_forget. Recall is hybrid (vector ANN + BM25, fused via RRF) with an optional cross-encoder reranker. Durable facts are extracted from sessions at pre-compress and session end. Everything runs in Hermes's Python process — no memory server.

Features

Hybrid recall: vector + BM25 fused with RRF; per-call switchable to pure vector or pure FTS.
Rerankers (optional): cross-encoder/ettin-reranker-32m-v1 by default; configurable model and candidate-pool size.
Workspace isolation: every row carries an agent_workspace tag and recall pre-filters by it.
Fact-first retrieval: recall surfaces extracted facts; raw conversation turns are stored as provenance and used only as fallback.
Mid-session extraction: facts are pulled out via an auxiliary LLM on on_pre_compress and on_session_end, so insights survive context compression.
Transparent forget: preview candidates, then delete by exact ID.
Auto-compaction: periodic table.optimize(cleanup_older_than=...) runs in the background to bound fragment and version-file growth from single-row writes.

Requirements

Python 3.11+
uv
Hermes Agent installed locally
An LLM API key (OpenAI, OpenRouter, Anthropic, …)

Runtime dependencies installed into Hermes's venv: lancedb >= 0.13, openai >= 2.38, pyyaml.

Installation: users

Use this section if you want LanceDB memory in your own Hermes setup. If you plan to edit the plugin's source, jump to Installation: developers.

1. Install Hermes Agent

# macOS / Linux / WSL2
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Windows (PowerShell)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

The installer handles uv, Python 3.11, Node.js, ripgrep, ffmpeg, and (on Windows) MinGit. It clones Hermes into ~/.hermes/hermes-agent/ and symlinks the binary to ~/.local/bin/hermes. After it finishes:

hermes doctor --fix     # repairs symlinks, dirs, etc.
hermes setup            # interactive: .env, API key, model picker
hermes doctor           # final sanity check

Note

If you have AWS credentials in your shell environment, hermes doctor may log a Bedrock AccessDeniedException. This is Hermes's provider auto-detection and is ignorable if you're using OpenAI / Anthropic / OpenRouter.

2. Install the plugin

hermes plugins install lancedb/hermes-agent-memory

This shallow-clones https://github.com/lancedb/hermes-agent-memory.git into ~/.hermes/plugins/lancedb/ and renders after-install.md in a Rich panel telling you what's next. To pull updates later, re-run the same command.

3. Install runtime dependencies into Hermes's venv

Hermes loads plugins inside its own Python interpreter. Install lancedb, openai, and pyyaml there — not into a separate venv.

# If Hermes is at a source checkout in /path/to/your/hermes-agent
uv pip install --python /path/to/your/hermes-agent/venv/bin/python3 lancedb openai pyyaml

# If you used the one-line installer
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python3 lancedb openai pyyaml

This step is deliberately manual so the packages land in the same Python environment that Hermes uses to load memory plugins.

4. Activate the provider

hermes memory setup
# pick "lancedb"

This writes memory.provider: lancedb into ~/.hermes/config.yaml and writes the plugin defaults under plugins.lancedb. The default embedding model is OpenAI text-embedding-3-small; the plugin reads OPENAI_API_KEY from the process environment, repo .env, or ~/.hermes/.env.

# ✓ LanceDB memory configured (embedding dim: 1536)
#  Start a new session to activate.

5. Verify

hermes plugins list           # should list "lancedb"
hermes memory status
hermes chat -q "Hello"        # agent.log should contain `lancedb provider initialized`

Installation: developers

Use this section if you're working on the plugin's source.

1. Clone and create the dev venv

git clone https://github.com/lancedb/hermes-agent-memory /path/to/your/hermes-agent-memory
cd /path/to/your/hermes-agent-memory
uv sync --extra dev

pyproject.toml sets [tool.uv] package = false − uv sync only manages a venv for tests, lint, and ad-hoc imports. The plugin itself is loaded by Hermes from its directory, not pip-installed.

2. Symlink into Hermes's plugins directory

ln -sf /path/to/your/hermes-agent-memory ~/.hermes/plugins/lancedb

Edits to source files are picked up on the next Hermes session: no reinstall.

3. Install runtime deps into Hermes's venv

The dev venv only runs pytest / ruff. For end-to-end testing inside Hermes itself you still need the runtime deps installed against Hermes's Python:

uv pip install --python /path/to/your/hermes-agent/venv/bin/python3 lancedb openai pyyaml

4. Tests and lint

uv run pytest -v
uv run ruff check .

Add dev-only dependencies via:

uv add --dev pytest-mock

Tools exposed to the agent

Tool	Purpose
`lancedb_recall`	Hybrid (default) / vector / FTS recall over workspace memory. Returns IDs, snippets, scores, provenance turn IDs.
`lancedb_remember`	Store a durable fact (`preference`, `entity`, `event`, `case`, `pattern`, `general`). Deduplicated by content hash.
`lancedb_read`	Fetch one memory by ID, optionally with the full provenance turns it was extracted from.
`lancedb_forget`	Two-step: `action: preview` to list candidates by description, then `action: delete` with the exact ID.

The provider's system-prompt block instructs the model when to use each tool: lancedb_remember only when the user explicitly asks to remember, lancedb_forget preview before any delete, etc.

How recall works

The tool call enters LanceDBMemoryProvider.recall() with mode, query, kind, optional category, and limit.
A WHERE filter is built on workspace + user + kind + category, quoted via quote_sql, and passed as a prefilter.
The base retriever depends on mode:
- hybrid: vector ANN + BM25, fused by LanceDB's built-in RRF.
- vector: ANN over the vector column (normalized OpenAI embeddings).
- fts: BM25 over the content column.
If reranker.type is cross-encoder, the candidate pool is expanded to rerank_top_n, the cross-encoder reorders the pool, and the top top_k are sliced in Python. The reranker instance is cached on the provider and warmed at initialize() so the first query doesn't pay the model-load cost.
The per-mode score column (_distance, _score, or _relevance_score) is explicitly projected to silence LanceDB's auto-projection deprecation warning and to keep score metadata in tool responses.

If hybrid fails (e.g. the FTS index hasn't been built yet), recall() falls back to pure vector with reranking disabled.

Configuration reference

Defaults are local and keyless. Override under plugins.lancedb in ~/.hermes/config.yaml:

plugins:
  lancedb:
    retrieval:
      mode: hybrid              # hybrid | vector | fts
      top_k: 10
      search_kinds: [fact]      # which row kinds recall returns; "turn" rows are provenance/fallback
      reranker:
        type: rrf               # rrf | cross-encoder
                                #   rrf          : Reciprocal Rank Fusion. The built-in
                                #                   fusion strategy for hybrid mode.
                                #                   No-op for vector/fts (native distance
                                #                   / BM25 order applies).
                                #   cross-encoder: replace RRF / native ordering with a
                                #                   LanceDB reranker.
        model: cross-encoder/ettin-reranker-32m-v1
        rerank_top_n: 50        # cross-encoder only: pull this many candidates from the
                                # base retriever, rerank, then slice to top_k. Larger =
                                # better recall, slower latency.
    extraction:
      enabled: true             # set false to disable LLM extraction at session boundaries
      min_turns: 3              # skip extraction for very short sessions
    embedding:
      provider: openai
      model: text-embedding-3-small
      dimension: 1536
    maintenance:
      enabled: true             # background optimize() of the Lance table
      optimize_every_commits: 50
                                # trigger when table.version - last_optimized >= N
      cleanup_older_than_days: 7
                                # passed to table.optimize(cleanup_older_than=...): old
                                # version files are garbage-collected on each run

Knob-by-knob

Section	Key	Default	Notes
`retrieval`	`mode`	`hybrid`	Per-call override available via the `mode` parameter on `lancedb_recall`.
	`top_k`	`10`	Hard cap inside the retrieval layer is 50.
	`search_kinds`	`[fact]`	Recall surfaces facts; turn rows are stored as provenance and used as fallback when no facts match.
`retrieval.reranker`	`type`	`rrf`	`rrf` is a no-op for `mode: vector` / `mode: fts`: there's only one ranked list to return.
	`model`	`cross-encoder/ettin-reranker-32m-v1`	Reranker model passed to LanceDB's cross-encoder reranker; lazy-loaded on first use.
	`rerank_top_n`	`50`	Enforced as `max(rerank_top_n, top_k)` so you never fetch fewer than you return.
`extraction`	`enabled`	`true`	Set `false` to skip the auxiliary LLM call.
	`min_turns`	`3`	Skip extraction when the user has spoken fewer than N turns.
`embedding`	`provider`	`openai`	Uses OpenAI-compatible embeddings.
	`model`	`text-embedding-3-small`	Embedding dim must match the existing table: recreate the table if you change models against an existing store.
	`dimension`	`1536`	Vector dimension used for the LanceDB schema.
`maintenance`	`enabled`	`true`	Set `false` to disable auto-compaction.
	`optimize_every_commits`	`50`	Each `add` / `delete` advances `table.version`; auto-compaction fires when delta ≥ this value.
	`cleanup_older_than_days`	`7`	Passed as `timedelta(days=...)` to `table.optimize()`. Set `0` or negative to skip cleanup (compaction only).

Auxiliary LLM for extraction

extraction uses Hermes's auxiliary client. Point it at a cheaper model independent of your main chat model:

auxiliary:
  lancedb_extraction:
    provider: openrouter
    model: google/gemini-3-flash

Hermes handles provider routing, fallback, and credit exhaustion.

Storage layout

Path	Contents
`~/.hermes/lancedb/memories.lance/`	LanceDB dataset directory (fragments, manifest, indexes).
`~/.hermes/lancedb/.last_optimize_version`	Sentinel file: `table.version` at the most recent successful `optimize()`. Used to decide when the next auto-compaction fires.
`~/.cache/huggingface/`	Optional reranker model cache when cross-encoder reranking is enabled.

The dataset is a single table named memories containing both fact and turn rows; the kind column distinguishes them. To poke at it directly:

uv run --project ~/.hermes/hermes-agent python -c "
import lancedb
db = lancedb.connect('~/.hermes/lancedb')
df = db.open_table('memories').to_pandas()
print(df[['kind', 'category', 'content']].head())
"

Auto-compaction

Every add / delete on the table is a Lance commit. Without intervention, single-row writes (which dominate agent workloads) accumulate tiny fragments and version files indefinitely.

The plugin tracks table.version against the sentinel file at ~/.hermes/lancedb/.last_optimize_version and runs table.optimize(cleanup_older_than=timedelta(days=N)) in a daemon thread when the delta crosses optimize_every_commits. A non-blocking lock guarantees only one optimize runs at a time: re-triggers while one is in flight are skipped, and writers are never blocked.

If maintenance.enabled: false, none of this runs and the dataset will grow without bound.

Troubleshooting

hermes plugins list doesn't show lancedb. Check the symlink: ls -l ~/.hermes/plugins/lancedb should resolve to this repo (or wherever you installed it).

lancedb_* tools missing from the agent. Confirm memory.provider: lancedb in ~/.hermes/config.yaml and that agent.log contains lancedb provider initialized on session start.

First recall hangs for 1–2 seconds. If reranker.type: cross-encoder is enabled, the reranker is preloaded during initialize() to avoid paying that cost on the first user query. OpenAI embedding calls also add network latency.

Table fragments / .lance directory growing. Check maintenance.enabled: true and that ~/.hermes/lancedb/.last_optimize_version is advancing across sessions. agent.log will show lancedb optimize starting when a compaction fires.

Changed embedding.model and recall returns nothing. The new model's dim doesn't match the existing column. Delete ~/.hermes/lancedb/memories.lance/ to recreate the table on the next session.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
after-install.md		after-install.md
config.py		config.py
embeddings.py		embeddings.py
extraction.py		extraction.py
plugin.yaml		plugin.yaml
provider.py		provider.py
pyproject.toml		pyproject.toml
retrieval.py		retrieval.py
store.py		store.py
tools.py		tools.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hermes-agent-memory

Features

Requirements

Installation: users

1. Install Hermes Agent

2. Install the plugin

3. Install runtime dependencies into Hermes's venv

4. Activate the provider

5. Verify

Installation: developers

1. Clone and create the dev venv

2. Symlink into Hermes's plugins directory

3. Install runtime deps into Hermes's venv

4. Tests and lint

Tools exposed to the agent

How recall works

Configuration reference

Knob-by-knob

Auxiliary LLM for extraction

Storage layout

Auto-compaction

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hermes-agent-memory

Features

Requirements

Installation: users

1. Install Hermes Agent

2. Install the plugin

3. Install runtime dependencies into Hermes's venv

4. Activate the provider

5. Verify

Installation: developers

1. Clone and create the dev venv

2. Symlink into Hermes's plugins directory

3. Install runtime deps into Hermes's venv

4. Tests and lint

Tools exposed to the agent

How recall works

Configuration reference

Knob-by-knob

Auxiliary LLM for extraction

Storage layout

Auto-compaction

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages