|
| 1 | +# SQLite Vector Extension – API Reference |
| 2 | + |
| 3 | +This extension enables efficient vector operations directly inside SQLite databases, making it ideal for on-device and edge AI applications. It supports various vector types and SIMD-accelerated distance functions. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## `vector_version()` |
| 8 | + |
| 9 | +**Returns:** `TEXT` |
| 10 | + |
| 11 | +**Description:** |
| 12 | +Returns the current version of the SQLite Vector Extension. |
| 13 | + |
| 14 | +**Example:** |
| 15 | + |
| 16 | +```sql |
| 17 | +SELECT vector_version(); |
| 18 | +-- e.g., '1.0.0' |
| 19 | +``` |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## `vector_backend()` |
| 24 | + |
| 25 | +**Returns:** `TEXT` |
| 26 | + |
| 27 | +**Description:** |
| 28 | +Returns the active backend used for vector computation. This indicates the SIMD or hardware acceleration available on the current system. |
| 29 | + |
| 30 | +**Possible Values:** |
| 31 | + |
| 32 | +* `CPU` – Generic fallback |
| 33 | +* `SSE2` – SIMD on Intel/AMD |
| 34 | +* `AVX2` – Advanced SIMD on modern x86 CPUs |
| 35 | +* `NEON` – SIMD on ARM (e.g., mobile) |
| 36 | + |
| 37 | +**Example:** |
| 38 | + |
| 39 | +```sql |
| 40 | +SELECT vector_backend(); |
| 41 | +-- e.g., 'AVX2' |
| 42 | +``` |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +## `vector_init(table, column, options)` |
| 47 | + |
| 48 | +**Returns:** `NULL` |
| 49 | + |
| 50 | +**Description:** |
| 51 | +Initializes the vector extension for a given table and column. This is **mandatory** before performing any vector search or quantization. |
| 52 | + |
| 53 | +**Parameters:** |
| 54 | + |
| 55 | +* `table` (TEXT): Name of the table containing vector data. |
| 56 | +* `column` (TEXT): Name of the column containing the vector embeddings (stored as BLOBs). |
| 57 | +* `options` (TEXT): Comma-separated key=value string. |
| 58 | + |
| 59 | +**Options:** |
| 60 | + |
| 61 | +* `dimension` (required): Integer specifying the length of each vector. |
| 62 | +* `type`: Vector data type. Options: |
| 63 | + |
| 64 | + * `FLOAT32` (default) |
| 65 | + * `FLOAT16` |
| 66 | + * `FLOATB16` |
| 67 | + * `INT8` |
| 68 | + * `UINT8` |
| 69 | +* `distance`: Distance function to use. Options: |
| 70 | + |
| 71 | + * `L2` (default) |
| 72 | + * `SQUARED_L2` |
| 73 | + * `COSINE` |
| 74 | + * `DOT` |
| 75 | + * `L1` |
| 76 | + |
| 77 | +**Example:** |
| 78 | + |
| 79 | +```sql |
| 80 | +SELECT vector_init('documents', 'embedding', 'dimension=384,type=FLOAT32,distance=cosine'); |
| 81 | +``` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## `vector_quantize(table, column, options)` |
| 86 | + |
| 87 | +**Returns:** `NULL` |
| 88 | + |
| 89 | +**Description:** |
| 90 | +Performs quantization on the specified table and column. This precomputes internal data structures to support fast approximate nearest neighbor (ANN) search. |
| 91 | + |
| 92 | +**Parameters:** |
| 93 | + |
| 94 | +* `table` (TEXT): Name of the table. |
| 95 | +* `column` (TEXT): Name of the column containing vector data. |
| 96 | +* `options` (TEXT, optional): Comma-separated key=value string. |
| 97 | + |
| 98 | +**Available options:** |
| 99 | + |
| 100 | +* `max_memory`: Max memory to use for quantization (default: 30MB) |
| 101 | + |
| 102 | +**Example:** |
| 103 | + |
| 104 | +```sql |
| 105 | +SELECT vector_quantize('documents', 'embedding', 'max_memory=50MB'); |
| 106 | +``` |
| 107 | + |
| 108 | +--- |
| 109 | + |
| 110 | +## `vector_quantize_memory(table, column)` |
| 111 | + |
| 112 | +**Returns:** `INTEGER` |
| 113 | + |
| 114 | +**Description:** |
| 115 | +Returns the amount of memory (in bytes) required to preload quantized data for the specified table and column. |
| 116 | + |
| 117 | +**Example:** |
| 118 | + |
| 119 | +```sql |
| 120 | +SELECT vector_quantize_memory('documents', 'embedding'); |
| 121 | +-- e.g., 28490112 |
| 122 | +``` |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## `vector_quantize_preload(table, column)` |
| 127 | + |
| 128 | +**Returns:** `NULL` |
| 129 | + |
| 130 | +**Description:** |
| 131 | +Loads the quantized representation for the specified table and column into memory. Should be used at startup to ensure optimal query performance. |
| 132 | + |
| 133 | +**Example:** |
| 134 | + |
| 135 | +```sql |
| 136 | +SELECT vector_quantize_preload('documents', 'embedding'); |
| 137 | +``` |
| 138 | + |
| 139 | +--- |
| 140 | + |
| 141 | +## `vector_cleanup(table, column)` |
| 142 | + |
| 143 | +**Returns:** `NULL` |
| 144 | + |
| 145 | +**Description:** |
| 146 | +Cleans up internal structures related to a previously quantized table/column. Use this if data has changed or quantization is no longer needed. |
| 147 | + |
| 148 | +**Example:** |
| 149 | + |
| 150 | +```sql |
| 151 | +SELECT vector_cleanup('documents', 'embedding'); |
| 152 | +``` |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## `vector_convert_f32(value)` |
| 157 | + |
| 158 | +## `vector_convert_f16(value)` |
| 159 | + |
| 160 | +## `vector_convert_bf16(value)` |
| 161 | + |
| 162 | +## `vector_convert_i8(value)` |
| 163 | + |
| 164 | +## `vector_convert_u8(value)` |
| 165 | + |
| 166 | +**Returns:** `BLOB` |
| 167 | + |
| 168 | +**Description:** |
| 169 | +Encodes a vector into the required internal BLOB format. This ensures proper insertion of vector values in the chosen format. |
| 170 | + |
| 171 | +**Parameters:** |
| 172 | + |
| 173 | +* `value` (TEXT or BLOB): |
| 174 | + |
| 175 | + * If `TEXT`, it must be a JSON array (e.g., `"[0.1, 0.2, 0.3]"`). |
| 176 | + * If `BLOB`, no check is performed; the user must ensure the format matches the specified type and dimension. |
| 177 | + |
| 178 | +**Usage by format:** |
| 179 | + |
| 180 | +```sql |
| 181 | +-- Insert a Float32 vector using JSON |
| 182 | +INSERT INTO documents(embedding) VALUES(vector_convert_f32('[0.1, 0.2, 0.3]')); |
| 183 | + |
| 184 | +-- Insert a UInt8 vector using raw BLOB (ensure correct formatting!) |
| 185 | +INSERT INTO compressed_vectors(embedding) VALUES(vector_convert_u8(X'010203')); |
| 186 | +``` |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## 🔍 `vector_full_scan(table, column, vector, k)` |
| 191 | + |
| 192 | +**Returns:** `Virtual Table (rowid, distance)` |
| 193 | + |
| 194 | +**Description:** |
| 195 | +Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets or validation. |
| 196 | + |
| 197 | +**Parameters:** |
| 198 | + |
| 199 | +* `table` (TEXT): Name of the target table. |
| 200 | +* `column` (TEXT): Column containing vectors. |
| 201 | +* `vector` (BLOB or JSON): The query vector. |
| 202 | +* `k` (INTEGER): Number of nearest neighbors to return. |
| 203 | + |
| 204 | +**Example:** |
| 205 | + |
| 206 | +```sql |
| 207 | +SELECT rowid, distance |
| 208 | +FROM vector_full_scan('documents', 'embedding', vector_convert_f32('[0.1, 0.2, 0.3]'), 5); |
| 209 | +``` |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +## ⚡ `vector_quantize_scan(table, column, vector, k)` |
| 214 | + |
| 215 | +**Returns:** `Virtual Table (rowid, distance)` |
| 216 | + |
| 217 | +**Description:** |
| 218 | +Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off. |
| 219 | + |
| 220 | +**Parameters:** |
| 221 | + |
| 222 | +* `table` (TEXT): Name of the target table. |
| 223 | +* `column` (TEXT): Column containing vectors. |
| 224 | +* `vector` (BLOB or JSON): The query vector. |
| 225 | +* `k` (INTEGER): Number of nearest neighbors to return. |
| 226 | + |
| 227 | +**Performance Highlights:** |
| 228 | + |
| 229 | +* Handles **1M vectors** of dimension 768 in a few milliseconds. |
| 230 | +* Uses **<50MB** of RAM. |
| 231 | +* Achieves **>0.95 recall**. |
| 232 | + |
| 233 | +**Example:** |
| 234 | + |
| 235 | +```sql |
| 236 | +SELECT rowid, distance |
| 237 | +FROM vector_quantize_scan('documents', 'embedding', vector_convert_f32('[0.1, 0.2, 0.3]'), 10); |
| 238 | +``` |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +## 📌 Notes |
| 243 | + |
| 244 | +* All vectors must have a fixed dimension per column, set during `vector_init`. |
| 245 | +* Only tables explicitly initialized using `vector_init` are eligible for vector search. |
| 246 | +* You **must run `vector_quantize()`** before using `vector_quantize_scan()`. |
| 247 | +* You can preload quantization at database open using `vector_quantize_preload()`. |
0 commit comments