You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: API.md
+23-3Lines changed: 23 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,6 +58,10 @@ SELECT vector_backend();
58
58
Initializes the vector extension for a given table and column. This is **mandatory** before performing any vector search or quantization.
59
59
`vector_init` must be called in every database connection that needs to perform vector operations.
60
60
61
+
The target table must have a **`rowid`** (an integer primary key, either explicit or implicit).
62
+
If the table was created using `WITHOUT ROWID`, it must have **exactly one primary key column of type `INTEGER`**.
63
+
This ensures that each vector can be uniquely identified and efficiently referenced during search and quantization.
64
+
61
65
**Parameters:**
62
66
63
67
*`table` (TEXT): Name of the table containing vector data.
@@ -214,7 +218,8 @@ INSERT INTO compressed_vectors(embedding) VALUES(vector_as_u8(X'010203'));
214
218
**Returns:**`Virtual Table (rowid, distance)`
215
219
216
220
**Description:**
217
-
Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets or validation.
221
+
Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets (rows < 1000000) or validation.
222
+
Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
218
223
219
224
**Parameters:**
220
225
@@ -237,7 +242,7 @@ FROM vector_full_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')
237
242
**Returns:**`Virtual Table (rowid, distance)`
238
243
239
244
**Description:**
240
-
Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off.
245
+
Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off. Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
241
246
242
247
You **must run `vector_quantize()`** before using `vector_quantize_scan()` and when data initialized for vectors changes.
243
248
@@ -271,7 +276,8 @@ FROM vector_quantize_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.
271
276
272
277
**Description:**
273
278
These streaming interfaces provide the same functionality as `vector_full_scan` and `vector_quantize_scan`, respectively, but are designed for incremental or filtered processing of results.
274
-
Unlike their non-streaming counterparts, these functions **omit the fourth parameter (`k`)** and allow you to use standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count.
279
+
280
+
Unlike their non-streaming counterparts, these functions **omit the fourth parameter (`k`)** and allow you to use standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count. Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
275
281
276
282
This makes them ideal for combining vector search with additional query conditions or progressive result consumption in streaming applications.
277
283
@@ -306,6 +312,20 @@ WHERE score > 0.8
306
312
LIMIT10;
307
313
```
308
314
315
+
**Accessing Additional Columns:**
316
+
317
+
```sql
318
+
-- Perform a filtered full scan with additional columns
319
+
SELECT
320
+
v.rowidAS sentence_id,
321
+
row_number() OVER (ORDER BYv.distance) AS rank_number,
322
+
v.distance
323
+
FROM vector_full_scan_stream('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')) AS v
324
+
JOIN sentences ONsentences.rowid=v.rowid
325
+
WHEREsentences.chunk_id=297
326
+
LIMIT3;
327
+
```
328
+
309
329
**Usage Notes:**
310
330
311
331
* These interfaces return rows progressively and can efficiently combine vector similarity with SQL-level filters.
0 commit comments