Skip to content

Commit 08a92c8

Browse files
author
Daniele Briggi
committed
feat(workflow): build pypi package
1 parent 4d701fb commit 08a92c8

File tree

11 files changed

+275
-13
lines changed

11 files changed

+275
-13
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: Build and Publish Python Package
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
version:
7+
description: "Version to use for the Python package (e.g. 0.1.0)"
8+
required: true
9+
type: string
10+
release:
11+
types: [published]
12+
13+
jobs:
14+
build-and-publish:
15+
runs-on: ubuntu-latest
16+
permissions:
17+
id-token: write # mandatory for Pypi trusted publishing
18+
defaults:
19+
run:
20+
shell: bash
21+
steps:
22+
- uses: actions/checkout@v4
23+
with:
24+
submodules: false
25+
26+
- name: Set up Python
27+
uses: actions/setup-python@v5
28+
29+
- name: Install build dependencies
30+
run: |
31+
python3 -m pip install --upgrade pip
32+
python3 -m pip install .[dev]
33+
34+
- name: Get version
35+
id: get_version
36+
run: |
37+
if [[ "${{ github.event_name }}" == "release" ]]; then
38+
VERSION="${{ github.event.release.tag_name }}"
39+
else
40+
VERSION="${{ github.event.inputs.version }}"
41+
fi
42+
VERSION=${VERSION#v}
43+
echo "version=$VERSION" >> $GITHUB_OUTPUT
44+
45+
- name: Build
46+
env:
47+
PACKAGE_VERSION: ${{ steps.get_version.outputs.version }}
48+
run: |
49+
# Update version in pyproject.toml
50+
sed -i 's/^version = ".*"/version = "${{ steps.get_version.outputs.version }}"/' pyproject.toml
51+
python setup.py bdist_wheel
52+
53+
- name: Publish to PyPI
54+
uses: pypa/gh-action-pypi-publish@release/v1
55+
with:
56+
packages-dir: dist
57+
verbose: true
58+
# Avoid workflow to fail if the version has already been published
59+
skip-existing: true
60+
# Upload to Test Pypi for testing
61+
repository-url: https://test.pypi.org/legacy/

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ repos:
1111
rev: 25.1.0
1212
hooks:
1313
- id: black
14+
- repo: https://github.com/PyCQA/autoflake
15+
rev: v2.3.1
16+
hooks:
17+
- id: autoflake
18+
args: ["--remove-all-unused-imports", "--remove-unused-variables", "--ignore-init-module-imports", "--in-place", "--recursive", "."]
1419
- repo: https://github.com/pycqa/isort
1520
rev: 6.0.1
1621
hooks:

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include README.md

README.md

Lines changed: 160 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,165 @@
1-
# sqlite-rag
1+
# SQLite RAG
2+
3+
A hybrid search engine built on SQLite with AI and Vector extensions. SQLite-RAG combines vector similarity search with full-text search using Reciprocal Rank Fusion (RRF) for enhanced document retrieval.
4+
5+
## Features
6+
7+
- **Hybrid Search**: Combines vector embeddings with full-text search for optimal results
8+
- **SQLite-based**: Built on SQLite with AI and Vector extensions for reliability and performance
9+
- **Multi-format Support**: Process 25+ file formats including PDF, DOCX, Markdown, code files
10+
- **Intelligent Chunking**: Token-aware text chunking with configurable overlap
11+
- **Interactive CLI**: Command-line interface with interactive REPL mode
12+
- **Flexible Configuration**: Customizable embedding models, search weights, and chunking parameters
213

314
## Installation
415

516
```bash
6-
pip install .[dev]
17+
pip install sqlite-rag
18+
```
19+
20+
## Quick Start
21+
22+
```bash
23+
# Initialize and add documents
24+
sqlite-rag add /path/to/documents --recursive
25+
26+
# Search your documents
27+
sqlite-rag search "your search query"
28+
29+
# Interactive mode
30+
sqlite-rag
31+
> help
32+
> search "interactive search"
33+
> exit
34+
```
35+
36+
## CLI Commands
37+
38+
### Document Management
39+
40+
**Add files or directories:**
41+
```bash
42+
sqlite-rag add <path> [--recursive] [--absolute-paths] [--metadata '{"key": "value"}']
43+
```
44+
45+
**Add raw text:**
46+
```bash
47+
sqlite-rag add-text "your text content" [uri] [--metadata '{"key": "value"}']
48+
```
49+
50+
**List all documents:**
51+
```bash
52+
sqlite-rag list
53+
```
54+
55+
**Remove documents:**
56+
```bash
57+
sqlite-rag remove <path-or-uuid> [--yes]
58+
```
59+
60+
### Search & Query
61+
62+
**Hybrid search:**
63+
```bash
64+
sqlite-rag search "your query" [--limit 10] [--debug]
65+
```
66+
67+
Use `--debug` to see detailed ranking information including vector ranks, FTS ranks, and combined scores.
68+
69+
### Database Operations
70+
71+
**Rebuild indexes and embeddings:**
72+
```bash
73+
sqlite-rag rebuild [--remove-missing]
774
```
75+
76+
**Clear entire database:**
77+
```bash
78+
sqlite-rag reset [--yes]
79+
```
80+
81+
### Configuration
82+
83+
**View current settings:**
84+
```bash
85+
sqlite-rag settings
86+
```
87+
88+
**Update configuration:**
89+
```bash
90+
sqlite-rag set [options]
91+
```
92+
93+
Available settings:
94+
- `--model-path-or-name`: Embedding model (file path or HuggingFace model)
95+
- `--embedding-dim`: Vector dimensions
96+
- `--chunk-size`: Text chunk size (tokens)
97+
- `--chunk-overlap`: Token overlap between chunks
98+
- `--weight-fts`: Full-text search weight (0.0-1.0)
99+
- `--weight-vec`: Vector search weight (0.0-1.0)
100+
- `--quantize-scan`: Enable quantized vectors for faster search
101+
- `--quantize-preload`: Preload quantized vectors in memory
102+
103+
## Python API
104+
105+
```python
106+
from sqlite_rag import SQLiteRag
107+
108+
# Create RAG instance
109+
rag = SQLiteRag.create("./database.sqlite")
110+
111+
# Add documents
112+
rag.add("/path/to/documents", recursive=True)
113+
rag.add_text("Raw text content", uri="doc.txt")
114+
115+
# Search
116+
results = rag.search("search query", top_k=5)
117+
for result in results:
118+
print(f"Score: {result.score}")
119+
print(f"Content: {result.content}")
120+
print(f"URI: {result.uri}")
121+
122+
# List documents
123+
documents = rag.list_documents()
124+
125+
# Remove document
126+
rag.remove_document("document-id-or-path")
127+
128+
# Database operations
129+
rag.rebuild(remove_missing=True)
130+
rag.reset()
131+
```
132+
133+
## Supported File Formats
134+
135+
SQLite-RAG supports 25+ file formats through the MarkItDown library:
136+
137+
- **Text**: `.txt`, `.md`, `.csv`, `.json`, `.xml`
138+
- **Documents**: `.pdf`, `.docx`, `.pptx`, `.xlsx`
139+
- **Code**: `.py`, `.js`, `.html`, `.css`, `.sql`
140+
- **And many more**: `.rtf`, `.odt`, `.epub`, `.zip`, etc.
141+
142+
## How It Works
143+
144+
1. **Document Processing**: Files are processed and split into overlapping chunks
145+
2. **Embedding Generation**: Text chunks are converted to vector embeddings using AI models
146+
3. **Dual Indexing**: Content is indexed for both vector similarity and full-text search
147+
4. **Hybrid Search**: Queries are processed through both search methods
148+
5. **Result Fusion**: Results are combined using Reciprocal Rank Fusion for optimal relevance
149+
150+
## Default Configuration
151+
152+
- **Model**: Qwen3-Embedding-0.6B (Q8_0 quantized, 1024 dimensions)
153+
- **Chunking**: 12,000 tokens per chunk with 1,200 token overlap
154+
- **Vectors**: FLOAT16 storage with cosine similarity
155+
- **Search**: Equal weighting (1.0) for vector and full-text results
156+
- **Database**: `./sqliterag.sqlite`
157+
158+
## Extensions Required
159+
160+
SQLite-RAG requires these SQLite extensions:
161+
162+
- **[sqlite-ai](https://github.com/sqliteai/sqlite-ai)**: LLM model loading and embedding generation
163+
- **[sqlite-vector](https://github.com/sqliteai/sqlite-vector)**: Vector storage and similarity search
164+
165+
These are automatically installed as dependencies.

pyproject.toml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,20 @@
11
[build-system]
2-
requires = ["setuptools>=61.0", "wheel"]
2+
requires = ["setuptools>=61.0", "wheel", "toml"]
33
build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "sqlite-rag"
7-
version = "0.1.0"
7+
version = "0.0.0"
88
description = "Hybird search with SQLite AI and SQLite Vector"
9-
authors = [{name = "User"}]
9+
authors = [{name = "SQLite AI Team"}]
1010
requires-python = ">=3.10"
11+
readme = "README.md"
12+
classifiers = [
13+
"Programming Language :: Python :: 3.10",
14+
"Programming Language :: Python :: 3.11",
15+
"Programming Language :: Python :: 3.12",
16+
"Operating System :: OS Independent",
17+
]
1118
dependencies = [
1219
"attrs",
1320
"typer",
@@ -17,7 +24,6 @@ dependencies = [
1724
"sqliteai-vector"
1825
]
1926

20-
# .. or [dependency-groups] ?
2127
[project.optional-dependencies]
2228
dev = [
2329
"pytest",
@@ -29,6 +35,11 @@ dev = [
2935
"pre-commit"
3036
]
3137

38+
[project.urls]
39+
Homepage = "https://sqlite.ai"
40+
Repository = "https://github.com/sqliteai/sqlite-rag"
41+
Issues = "https://github.com/sqliteai/sqlite-rag/issues"
42+
3243
[project.scripts]
3344
sqlite-rag = "sqlite_rag.cli:cli"
3445

setup.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import setuptools
2+
import toml
3+
4+
with open("pyproject.toml", "r") as f:
5+
pyproject = toml.load(f)
6+
7+
project = pyproject["project"]
8+
9+
with open("README.md", "r", encoding="utf-8") as f:
10+
long_description = f.read()
11+
12+
setuptools.setup(
13+
name=project["name"],
14+
version=project["version"],
15+
description=project.get("description", ""),
16+
author=project["authors"][0]["name"] if project.get("authors") else "",
17+
long_description=long_description,
18+
long_description_content_type="text/markdown",
19+
url=project["urls"]["Homepage"],
20+
packages=setuptools.find_packages(where="src"),
21+
package_dir={"": "src"},
22+
include_package_data=True,
23+
python_requires=project.get("requires-python", ">=3.10"),
24+
classifiers=project.get("classifiers", []),
25+
)

src/sqlite_rag/sqliterag.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import sqlite3
22
from dataclasses import asdict
33
from pathlib import Path
4-
from typing import Optional
4+
from typing import Any, Optional
55

66
from sqlite_rag.logger import Logger
77
from sqlite_rag.models.document_result import DocumentResult
@@ -43,7 +43,7 @@ def _ensure_initialized(self):
4343

4444
@staticmethod
4545
def create(
46-
db_path: str = "./sqliterag.sqlite", settings: Optional[Settings] = None
46+
db_path: str = "./sqliterag.sqlite", settings: Optional[dict[str, Any]] = None
4747
) -> "SQLiteRag":
4848
"""Create a new SQLiteRag instance with the given settings.
4949

tests/test_chunker.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def test_character_splitting_token_limit(self, chunker_tiny):
157157
# Each chunk should respect the chunk size limit
158158
for chunk in chunks:
159159
token_count = chunker_tiny._get_token_count(chunk.content)
160-
assert token_count <= chunker_tiny.settings.chunk_size
160+
assert token_count <= chunker_tiny._settings.chunk_size
161161

162162

163163
class TestOverlapFunctionality:

tests/test_database.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
class TestDatabase:
88
def test_db_initialization(self):
9-
conn = sqlite3.connect(":memory")
9+
conn = sqlite3.connect(":memory:")
1010
Database.initialize(conn, Settings())
1111

1212
# Check if the tables exist

tests/test_engine.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import pytest
2+
13
from sqlite_rag.chunker import Chunker
24
from sqlite_rag.engine import Engine
35
from sqlite_rag.models.chunk import Chunk
@@ -20,6 +22,7 @@ def test_search_with_empty_database(self, engine):
2022

2123
assert len(results) == 0
2224

25+
@pytest.mark.skip(reason="Waiting for sqlite-ai context to be fixed")
2326
def test_search_with_semantic_and_fts(self, db_conn):
2427
# Arrange
2528
conn, settings = db_conn

0 commit comments

Comments
 (0)