Skip to content

AlexHaborets/trovadb

Repository files navigation

TrovaDB

A lightweight, pure-Python vector database built from scratch.

This project explores the mechanics of vector similarity search by implementing a custom indexer based on the Vamana Graph algorithm (DiskANN). Designed for educational purposes and lightweight use cases, including semantic search and Retrieval Augmented Generation (RAG).

Key Features

Note: This project is work in progress. APIs and features are subject to change.

  • Vamana Graph Indexing: Utilizes the algorithm behind DiskANN.
  • Index Auto-Tuning: Implements adaptive tuning of the parameter alpha to stabilize average graph degree via a custom PI controller, fitting to different dataset structure and improving recall without sacrificing latency.
  • Built-in Reranking: Natively supports MMR (Maximal Marginal Relevance) reranking out of the box, guaranteeing varied and contextually rich context for RAG applications.
  • C-Level Speed: By leveraging Numba JIT compilation, TrovaDB achieves indexing and search performance comparable to C while maintaining a readable, hackable Python codebase.
  • Persistence: The full database is stored reliably in a single SQLite file ensuring portability and crash-safety.
  • Data Science Ready SDK: A lightweight Python client designed with native NumPy support and simple interface.
  • Familiar Stack: Powered by FastAPI, SQLAlchemy and Alembic.

Getting Started

1. Installation

You can install TrovaDB directly from GitHub using pip.

Option A: Client and Server (Recommended)

If you want to run the database server locally, install it with the [server] extra:

pip install "trovadb[server] @ git+https://github.com/AlexHaborets/trovadb.git"

Option B: Client Only

pip install git+https://github.com/AlexHaborets/trovadb.git

2. Starting the Server

Once installed with the [server] extra, you can easily start the database server:

trovadb-server

(Runs on localhost:8000 by default)

Alternative: Using Docker

If you prefer not to install dependencies locally, you can clone the repository and run it instantly via Docker:

docker compose up --build

3. Client Usage

The client is designed to be as intuitive as possible.

from trovadb.client import Client

with Client() as client:
    # Create a collection
    collection = client.get_or_create_collection("demo", dimension=3, metric="cosine")
    
    # Upsert vectors (combines insert & update operations in one)
    collection.upsert(
        ids=["1", "2", "3", "4", "5"], 
        vectors=[
            [0.1, 0.2, 0.3], 
            [0.9, 0.8, 0.7],
            [0.2, 0.4, 0.4],
            [0.1, 0.8, 0.2],
            [0.5, 0.3, 0.6]
        ]
    )
    
    q = [0.1, 0.2, 0.3]
    
    # Search for three nearest neighbors of q
    results = collection.search(query=q, k=3)
    print(results)

    # Delete specified vectors
    collection.delete(ids=["1", "3"])

    # Delete entire collection
    client.delete_collection("demo")

Examples

Check out the examples folder in the root of the repository for detailed usage:

Why TrovaDB?

The name is inspired by the italian phrase "Cerca Trova" ("Seek and you shall find") — a cryptic clue left by Vasari in believed to indicate that a lost Da Vinci work is hidden beneath his fresco in Florence.

Acknowledgements

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A minimalistic, pure-Python vector database for semantic search and RAG applications, featuring Vamana graph indexing and persistence.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages