You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A hybrid search engine built on SQLite with AI and Vector extensions. SQLite-RAG combines vector similarity search with full-text search using Reciprocal Rank Fusion (RRF) for enhanced document retrieval.
A hybrid search engine built on SQLite with [SQLite AI](https://github.com/sqliteai/sqlite-ai) and [SQLite Vector](https://github.com/sqliteai/sqlite-vector) extensions. SQLite RAG combines vector similarity search with full-text search ([FTS5](https://www.sqlite.org/fts5.html) extension) using Reciprocal Rank Fusion (RRF) for enhanced document retrieval.
4
11
5
12
## Features
6
13
7
14
-**Hybrid Search**: Combines vector embeddings with full-text search for optimal results
8
15
-**SQLite-based**: Built on SQLite with AI and Vector extensions for reliability and performance
9
-
-**Multi-format Support**: Process 25+ file formats including PDF, DOCX, Markdown, code files
10
-
-**Intelligent Chunking**: Token-aware text chunking with configurable overlap
16
+
-**Multi-format Text Support**: Process text file formats including PDF, DOCX, Markdown, code files
17
+
-**Recursive Character Text Splitter**: Token-aware text chunking with configurable overlap
11
18
-**Interactive CLI**: Command-line interface with interactive REPL mode
12
19
-**Flexible Configuration**: Customizable embedding models, search weights, and chunking parameters
13
20
@@ -19,8 +26,16 @@ pip install sqlite-rag
19
26
20
27
## Quick Start
21
28
29
+
Download the model [Embedding Gemma](https://huggingface.co/unsloth/embeddinggemma-300m-GGUF) from Hugging Face chosen as default model:
A simple evaluation script for SQLite Rag using the MS MARCO dataset. Compares performance against the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) benchmarks.
4
+
5
+
## MS MARCO Dataset
6
+
7
+
**MS MARCO**: Microsoft Question-Answering dataset with real web queries and passages.
8
+
9
+
## Evaluation Metrics
10
+
11
+
-**Hit Rate (HR@k)**: Percentage of queries with relevant results in top-k
12
+
-**MRR**: Mean Reciprocal Rank - position-weighted relevance score
13
+
-**NDCG**: Normalized Discounted Cumulative Gain - ranking quality metric
14
+
15
+
## Usage
16
+
17
+
### 1. Setup Configuration
18
+
19
+
Create an example config file and then edit it with your model settings:
20
+
21
+
```bash
22
+
python ms_marco.py create-config
23
+
```
24
+
25
+
### 2. Process Dataset
26
+
27
+
```bash
28
+
python ms_marco.py process --config configs/my_config.json --limit-rows 100
29
+
```
30
+
31
+
Processes MS MARCO passages into the SQLite Rag database for evaluation.
0 commit comments