Skip to content

restheart-ai — pluggable embedding and reranking providers via LangChain4j #591

@ujibang

Description

@ujibang

Brief overview

Extend the restheart-ai plugin (Phase 1) with pluggable embedding and reranking providers backed by LangChain4j, enabling vector search workflows on any MongoDB instance — without relying on MongoDB's autoEmbed or the Atlas Reranking API.

Rationale

Phase 1 of restheart-ai leverages MongoDB's native autoEmbed for embedding generation and the Atlas Reranking API for reranking. This works well but has limitations:

  • autoEmbed uses only Voyage AI models — no choice of embedding model or provider
  • The Atlas Reranking API is available on Atlas only
  • Some deployments need local/on-premise embedding (e.g., for data privacy or air-gapped environments)
  • Some use cases require specific embedding models (e.g., domain-specific fine-tuned models)

Phase 2 adds pluggable providers that let RESTHeart generate embeddings and perform reranking using any LangChain4j-supported backend, while remaining fully backward-compatible with Phase 1.

Detailed documentation

Phase 2: Pluggable Providers

What Phase 2 adds on top of Phase 1

Feature Phase 1 Phase 2
Embedding generation MongoDB autoEmbed Pluggable: OpenAI, Voyage AI, AWS Bedrock, ONNX, Ollama, or custom
Query vectorization MongoDB queryString $vectorize operator (text → vector inline in pipelines)
Auto-embedding on write MongoDB autoEmbed AutoEmbeddingInterceptor (RESTHeart generates embeddings on POST/PUT/PATCH)
Document chunking Text extraction + storage (MongoDB embeds via autoEmbed) Text extraction + embedding generation + storage (RESTHeart embeds each chunk)
Reranking Atlas Reranking API only Pluggable: Cohere, Voyage AI (Atlas), ONNX, or custom

Embedding Providers

Five out-of-the-box embedding providers, all backed by LangChain4j:

Provider Backend Default Model Type
openAIEmbeddingProvider OpenAI API text-embedding-3-small (1536d) Cloud
voyageEmbeddingProvider Voyage AI API voyage-3.5 Cloud
bedrockEmbeddingProvider AWS Bedrock amazon.titan-embed-text-v2:0 (1024d) Cloud (AWS)
onnxEmbeddingProvider ONNX AllMiniLmL6V2 AllMiniLmL6V2 (384d) Local, in-process
ollamaEmbeddingProvider Ollama server nomic-embed-text Local server

Each provider is a Provider<EmbeddingProvider> registered with @RegisterPlugin. Custom providers can be added by implementing the same interface.

Reranking Providers

Three out-of-the-box reranking providers, backed by LangChain4j's ScoringModel interface:

Provider Backend Default Model Type
cohereRerankProvider Cohere API rerank-v3.5 Cloud
voyageRerankProvider Atlas Voyage AI API rerank-2.5 Cloud (Atlas)
onnxRerankProvider ONNX cross-encoder MiniLM-L6-v2-cross-encoder Local, in-process

LangChain4j also supports additional reranking backends (Jina, Google Vertex AI, Xinference, watsonx.ai) that can be added as custom providers.

Configuration

plugins-args:
  aiActivator:
    embedding-provider: openAIEmbeddingProvider     # plugin name of the chosen embedding provider
    rerank-provider: cohereRerankProvider            # optional, plugin name of the chosen reranking provider
    atlas-api-key: "..."                             # optional, kept from Phase 1 for Atlas reranking

  # --- Embedding providers (choose one) ---

  openAIEmbeddingProvider:
    api-key: "sk-proj-..."
    model: text-embedding-3-small                   # optional, this is the default

  # -- OR --
  voyageEmbeddingProvider:
    api-key: "pa-..."
    model: voyage-3.5                               # optional, this is the default

  # -- OR --
  bedrockEmbeddingProvider:
    region: us-east-1                                 # optional, defaults to us-east-1
    model: amazon.titan-embed-text-v2:0               # optional, this is the default
    # uses default AWS credential chain (env vars, ~/.aws/credentials, IAM role, etc.)

  # -- OR --
  onnxEmbeddingProvider: {}                         # no configuration needed

  # -- OR --
  ollamaEmbeddingProvider:
    base-url: "http://localhost:11434"              # optional, this is the default
    model: nomic-embed-text                         # optional, this is the default

  # --- Reranking providers (choose one, optional) ---

  cohereRerankProvider:
    api-key: "..."
    model: rerank-v3.5                              # optional, this is the default

  # -- OR --
  voyageRerankProvider:
    api-key: "..."                                  # Atlas Model API key
    model: rerank-2.5                               # optional, this is the default

  # -- OR --
  onnxRerankProvider: {}                            # no configuration needed

Feature 1: Custom Operator Registry in VarsInterpolator

A general-purpose extension mechanism in RESTHeart core: VarsInterpolator gains a static registry of custom operators (registerOperator(String name, Function<BsonValue, BsonValue>)). During the existing recursive BSON walk, after resolving $var, the interpolator checks for registered custom operators and delegates resolution to the registered function.

This is not AI-specific — it is a general extension point that any plugin can use to add custom BSON operators resolved during aggregation pipeline interpolation.

Feature 2: $vectorize Operator

The $vectorize operator converts text to an embedding vector inline in aggregation pipelines. The restheart-ai plugin registers $vectorize at startup via the custom operator registry — the core has no knowledge of embeddings or LangChain4j.

Example:

# Create a standard vector search index (not autoEmbed)
PUT /mydb/articles/_indexes/article_vectors
Content-Type: application/json

{ "type": "vectorSearch",
  "definition": { "fields": [
    { "type": "vector", "path": "embedding", "numDimensions": 1536, "similarity": "cosine" }
  ]}}

# Define aggregation using $vectorize
PUT /mydb/articles
Content-Type: application/json

{ "aggrs": [{ "uri": "search", "type": "pipeline", "stages": [
    { "$vectorSearch": {
        "index": "article_vectors",
        "path": "embedding",
        "queryVector": { "$vectorize": { "$var": "query" } },
        "numCandidates": 100,
        "limit": 10
    }},
    { "$project": { "embedding": 0 } }
]}]}

# Execute — client passes text, RESTHeart generates the vector
GET /mydb/articles/_aggrs/search?avars={"query":"machine learning for NLP"}

RESTHeart resolves $var first (producing the text string), then calls the registered $vectorize function (which invokes the configured embedding provider), and replaces { "$vectorize": ... } with a float array before sending the pipeline to MongoDB.

Feature 3: Auto-Embedding on Write

When a collection has vectorSearch metadata with textField and embeddingField, the AutoEmbeddingInterceptor automatically generates embeddings for documents on POST, PUT, and PATCH requests using the configured embedding provider.

Enable auto-embedding:

PATCH /mydb/articles
Content-Type: application/json

{ "vectorSearch": { "textField": "description", "embeddingField": "embedding" } }

Write a document — the embedding is generated automatically:

POST /mydb/articles
Content-Type: application/json

{ "title": "My Article", "description": "An introduction to vector search in MongoDB" }

The stored document will include an embedding field containing the float array generated by the configured provider.

Feature 4: Document Chunking with Embedding

Phase 1's document chunking stores text segments without embeddings (relying on autoEmbed indexes). Phase 2 extends the DocumentChunkingInterceptor to optionally generate embeddings for each chunk using the configured embedding provider.

When an embedding-provider is configured in aiActivator, the interceptor generates and stores embeddings alongside each text segment. When no embedding provider is configured, the behavior remains the same as Phase 1 (text-only segments, embeddings delegated to MongoDB autoEmbed).

Each text segment is stored as:

{
  "_id": { "$oid": "..." },
  "text": "chunk text content...",
  "vector": [0.123, -0.456, 0.789, ...],
  "metadata": {
    "index": 0,
    "fileId": { "$oid": "..." },
    "filename": "article.pdf",
    "tags": ["public", "tech"]
  }
}

Feature 5: Pluggable Reranking

Phase 1's reranking calls the Atlas Reranking API directly. Phase 2 makes reranking pluggable: when a rerank-provider is configured in aiActivator, the reranking interceptor uses the configured LangChain4j-backed provider instead of the Atlas API.

This enables reranking on any MongoDB deployment — not just Atlas — and allows choosing between different reranking backends (Cohere, ONNX local, etc.).

Backward Compatibility with Phase 1

Phase 2 is fully backward-compatible:

  • Without embedding-provider configured: $vectorize is not registered, auto-embedding is disabled, chunking stores text-only segments → behaves exactly like Phase 1
  • Without rerank-provider configured: reranking uses the Atlas API via atlas-api-key → behaves exactly like Phase 1
  • autoEmbed indexes and queryString continue to work as before

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions