Skip to content

Commit 0b2aaca

Browse files
authored
Create API.md
1 parent ef5c8ff commit 0b2aaca

File tree

1 file changed

+247
-0
lines changed

1 file changed

+247
-0
lines changed

API.md

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
# SQLite Vector Extension – API Reference
2+
3+
This extension enables efficient vector operations directly inside SQLite databases, making it ideal for on-device and edge AI applications. It supports various vector types and SIMD-accelerated distance functions.
4+
5+
---
6+
7+
## `vector_version()`
8+
9+
**Returns:** `TEXT`
10+
11+
**Description:**
12+
Returns the current version of the SQLite Vector Extension.
13+
14+
**Example:**
15+
16+
```sql
17+
SELECT vector_version();
18+
-- e.g., '1.0.0'
19+
```
20+
21+
---
22+
23+
## `vector_backend()`
24+
25+
**Returns:** `TEXT`
26+
27+
**Description:**
28+
Returns the active backend used for vector computation. This indicates the SIMD or hardware acceleration available on the current system.
29+
30+
**Possible Values:**
31+
32+
* `CPU` – Generic fallback
33+
* `SSE2` – SIMD on Intel/AMD
34+
* `AVX2` – Advanced SIMD on modern x86 CPUs
35+
* `NEON` – SIMD on ARM (e.g., mobile)
36+
37+
**Example:**
38+
39+
```sql
40+
SELECT vector_backend();
41+
-- e.g., 'AVX2'
42+
```
43+
44+
---
45+
46+
## `vector_init(table, column, options)`
47+
48+
**Returns:** `NULL`
49+
50+
**Description:**
51+
Initializes the vector extension for a given table and column. This is **mandatory** before performing any vector search or quantization.
52+
53+
**Parameters:**
54+
55+
* `table` (TEXT): Name of the table containing vector data.
56+
* `column` (TEXT): Name of the column containing the vector embeddings (stored as BLOBs).
57+
* `options` (TEXT): Comma-separated key=value string.
58+
59+
**Options:**
60+
61+
* `dimension` (required): Integer specifying the length of each vector.
62+
* `type`: Vector data type. Options:
63+
64+
* `FLOAT32` (default)
65+
* `FLOAT16`
66+
* `FLOATB16`
67+
* `INT8`
68+
* `UINT8`
69+
* `distance`: Distance function to use. Options:
70+
71+
* `L2` (default)
72+
* `SQUARED_L2`
73+
* `COSINE`
74+
* `DOT`
75+
* `L1`
76+
77+
**Example:**
78+
79+
```sql
80+
SELECT vector_init('documents', 'embedding', 'dimension=384,type=FLOAT32,distance=cosine');
81+
```
82+
83+
---
84+
85+
## `vector_quantize(table, column, options)`
86+
87+
**Returns:** `NULL`
88+
89+
**Description:**
90+
Performs quantization on the specified table and column. This precomputes internal data structures to support fast approximate nearest neighbor (ANN) search.
91+
92+
**Parameters:**
93+
94+
* `table` (TEXT): Name of the table.
95+
* `column` (TEXT): Name of the column containing vector data.
96+
* `options` (TEXT, optional): Comma-separated key=value string.
97+
98+
**Available options:**
99+
100+
* `max_memory`: Max memory to use for quantization (default: 30MB)
101+
102+
**Example:**
103+
104+
```sql
105+
SELECT vector_quantize('documents', 'embedding', 'max_memory=50MB');
106+
```
107+
108+
---
109+
110+
## `vector_quantize_memory(table, column)`
111+
112+
**Returns:** `INTEGER`
113+
114+
**Description:**
115+
Returns the amount of memory (in bytes) required to preload quantized data for the specified table and column.
116+
117+
**Example:**
118+
119+
```sql
120+
SELECT vector_quantize_memory('documents', 'embedding');
121+
-- e.g., 28490112
122+
```
123+
124+
---
125+
126+
## `vector_quantize_preload(table, column)`
127+
128+
**Returns:** `NULL`
129+
130+
**Description:**
131+
Loads the quantized representation for the specified table and column into memory. Should be used at startup to ensure optimal query performance.
132+
133+
**Example:**
134+
135+
```sql
136+
SELECT vector_quantize_preload('documents', 'embedding');
137+
```
138+
139+
---
140+
141+
## `vector_cleanup(table, column)`
142+
143+
**Returns:** `NULL`
144+
145+
**Description:**
146+
Cleans up internal structures related to a previously quantized table/column. Use this if data has changed or quantization is no longer needed.
147+
148+
**Example:**
149+
150+
```sql
151+
SELECT vector_cleanup('documents', 'embedding');
152+
```
153+
154+
---
155+
156+
## `vector_convert_f32(value)`
157+
158+
## `vector_convert_f16(value)`
159+
160+
## `vector_convert_bf16(value)`
161+
162+
## `vector_convert_i8(value)`
163+
164+
## `vector_convert_u8(value)`
165+
166+
**Returns:** `BLOB`
167+
168+
**Description:**
169+
Encodes a vector into the required internal BLOB format. This ensures proper insertion of vector values in the chosen format.
170+
171+
**Parameters:**
172+
173+
* `value` (TEXT or BLOB):
174+
175+
* If `TEXT`, it must be a JSON array (e.g., `"[0.1, 0.2, 0.3]"`).
176+
* If `BLOB`, no check is performed; the user must ensure the format matches the specified type and dimension.
177+
178+
**Usage by format:**
179+
180+
```sql
181+
-- Insert a Float32 vector using JSON
182+
INSERT INTO documents(embedding) VALUES(vector_convert_f32('[0.1, 0.2, 0.3]'));
183+
184+
-- Insert a UInt8 vector using raw BLOB (ensure correct formatting!)
185+
INSERT INTO compressed_vectors(embedding) VALUES(vector_convert_u8(X'010203'));
186+
```
187+
188+
---
189+
190+
## 🔍 `vector_full_scan(table, column, vector, k)`
191+
192+
**Returns:** `Virtual Table (rowid, distance)`
193+
194+
**Description:**
195+
Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets or validation.
196+
197+
**Parameters:**
198+
199+
* `table` (TEXT): Name of the target table.
200+
* `column` (TEXT): Column containing vectors.
201+
* `vector` (BLOB or JSON): The query vector.
202+
* `k` (INTEGER): Number of nearest neighbors to return.
203+
204+
**Example:**
205+
206+
```sql
207+
SELECT rowid, distance
208+
FROM vector_full_scan('documents', 'embedding', vector_convert_f32('[0.1, 0.2, 0.3]'), 5);
209+
```
210+
211+
---
212+
213+
## `vector_quantize_scan(table, column, vector, k)`
214+
215+
**Returns:** `Virtual Table (rowid, distance)`
216+
217+
**Description:**
218+
Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off.
219+
220+
**Parameters:**
221+
222+
* `table` (TEXT): Name of the target table.
223+
* `column` (TEXT): Column containing vectors.
224+
* `vector` (BLOB or JSON): The query vector.
225+
* `k` (INTEGER): Number of nearest neighbors to return.
226+
227+
**Performance Highlights:**
228+
229+
* Handles **1M vectors** of dimension 768 in a few milliseconds.
230+
* Uses **<50MB** of RAM.
231+
* Achieves **>0.95 recall**.
232+
233+
**Example:**
234+
235+
```sql
236+
SELECT rowid, distance
237+
FROM vector_quantize_scan('documents', 'embedding', vector_convert_f32('[0.1, 0.2, 0.3]'), 10);
238+
```
239+
240+
---
241+
242+
## 📌 Notes
243+
244+
* All vectors must have a fixed dimension per column, set during `vector_init`.
245+
* Only tables explicitly initialized using `vector_init` are eligible for vector search.
246+
* You **must run `vector_quantize()`** before using `vector_quantize_scan()`.
247+
* You can preload quantization at database open using `vector_quantize_preload()`.

0 commit comments

Comments
 (0)