|
| 1 | +# Semantic Search Example with sqlite-vector |
| 2 | + |
| 3 | +This example in Python demonstrates how to build a semantic search engine using the [sqlite-vector](https://github.com/sqliteai/sqlite-vector) extension and a Sentence Transformer model. It allows you to index and search documents using vector similarity, powered by a local LLM embedding model. |
| 4 | + |
| 5 | +### How it works |
| 6 | + |
| 7 | +- **Embeddings**: Uses [sentence-transformers](https://huggingface.co/sentence-transformers) to generate dense vector representations (embeddings) for text. The default model is [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), a fast, lightweight model (384 dimensions) suitable for semantic search and retrieval tasks. |
| 8 | +- **Vector Store and Search**: Embeddings are stored in SQLite using the [`sqlite-vector`](https://github.com/sqliteai/sqlite-vector) extension, enabling fast similarity search (cosine distance) directly in the database. |
| 9 | +- **Sample Data**: The `samples/` directory contains example documents you can index and search immediately. |
| 10 | + |
| 11 | +### Installation |
| 12 | + |
| 13 | +1. Download the `sqlite-vector` extension for your platform [here](https://github.com/sqliteai/sqlite-vector/releases). |
| 14 | + |
| 15 | +2. Extract the `vector.so` file in the main directory of the project. |
| 16 | + |
| 17 | +3. Install the dependencies: |
| 18 | + |
| 19 | + |
| 20 | +```bash |
| 21 | +$ python -m venv venv |
| 22 | + |
| 23 | +$ source venv/bin/activate |
| 24 | + |
| 25 | +$ pip install -r requirements.txt |
| 26 | +``` |
| 27 | + |
| 28 | +4. On first use, the required model will be downloaded automatically. |
| 29 | + |
| 30 | +### Usage |
| 31 | + |
| 32 | +Use the interactive mode to keep the model in memory and run multiple queries efficiently: |
| 33 | + |
| 34 | +```bash |
| 35 | +python semsearch.py --repl |
| 36 | + |
| 37 | +# Index a directory of documents |
| 38 | +semsearch> index ./samples |
| 39 | + |
| 40 | +# Search for similar documents |
| 41 | +semsearch> search "neural network architectures for image recognition" |
| 42 | +``` |
| 43 | + |
| 44 | +### Example Queries |
| 45 | + |
| 46 | +Try these queries to test semantic similarity: |
| 47 | + |
| 48 | +- "neural network architectures for image recognition" |
| 49 | +- "reinforcement learning in autonomous systems" |
| 50 | +- "explainable artificial intelligence methods" |
| 51 | +- "AI governance and regulatory compliance" |
| 52 | +- "network intrusion detection systems" |
| 53 | + |
| 54 | +**Note:** |
| 55 | +- Supported extension are `.md`, `.txt`, `.py`, `.js`, `.html`, `.css`, `.sql`, `.json`, `.xml`. |
| 56 | +- For more details, see the code in `semsearch.py` and `semantic_search.py`. |
0 commit comments