Skip to content

enhancement: evaluate larger embedding model to improve retrieval quality #26

@RutgerBos

Description

@RutgerBos

Problem

The system uses all-MiniLM-L6-v2 as its embedding model (hardcoded default in retrievers.py). This is a 6-layer, 22M parameter model optimised for speed. It performs reasonably on general semantic similarity but struggles with:

  • Short, terse, factual text (typical of memory notes)
  • Domain-specific terminology
  • Nuanced queries that require deeper semantic understanding

Options worth evaluating

Model Params Notes
all-MiniLM-L6-v2 (current) 22M Fast, low quality ceiling
all-MiniLM-L12-v2 33M Same family, 2× layers, meaningful quality bump for low cost
all-mpnet-base-v2 109M Best general-purpose SentenceTransformer, strong on short texts
nomic-embed-text (via Ollama) Keeps everything local and on-GPU, fits the project's local-only stance

Suggested approach

  1. Fix the L2 → cosine metric bug (bug: ChromaDB collection uses L2 distance instead of cosine — degrades semantic search quality #24) first so benchmarks are meaningful
  2. Run a small retrieval eval against the existing memory store with each model
  3. Make the model name configurable via AMEM_EMBEDDING_MODEL env var (it's already a constructor parameter — just needs wiring to the env)

Note

Switching models on an existing persistent collection requires rebuilding the index (same migration caveat as #24). The MCP server's in-memory collection rebuilds fresh each session, so it's unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions