Skip to content

Commit fdb6a5f

Browse files
authored
Clarify how to access additional columns
1 parent e159d6d commit fdb6a5f

File tree

1 file changed

+23
-3
lines changed

1 file changed

+23
-3
lines changed

API.md

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ SELECT vector_backend();
5858
Initializes the vector extension for a given table and column. This is **mandatory** before performing any vector search or quantization.
5959
`vector_init` must be called in every database connection that needs to perform vector operations.
6060

61+
The target table must have a **`rowid`** (an integer primary key, either explicit or implicit).
62+
If the table was created using `WITHOUT ROWID`, it must have **exactly one primary key column of type `INTEGER`**.
63+
This ensures that each vector can be uniquely identified and efficiently referenced during search and quantization.
64+
6165
**Parameters:**
6266

6367
* `table` (TEXT): Name of the table containing vector data.
@@ -214,7 +218,8 @@ INSERT INTO compressed_vectors(embedding) VALUES(vector_as_u8(X'010203'));
214218
**Returns:** `Virtual Table (rowid, distance)`
215219

216220
**Description:**
217-
Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets or validation.
221+
Performs a brute-force nearest neighbor search using the given vector. Despite its brute-force nature, this function is highly optimized and useful for small datasets (rows < 1000000) or validation.
222+
Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
218223

219224
**Parameters:**
220225

@@ -237,7 +242,7 @@ FROM vector_full_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')
237242
**Returns:** `Virtual Table (rowid, distance)`
238243

239244
**Description:**
240-
Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off.
245+
Performs a fast approximate nearest neighbor search using the pre-quantized data. This is the **recommended query method** for large datasets due to its excellent speed/recall/memory trade-off. Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
241246

242247
You **must run `vector_quantize()`** before using `vector_quantize_scan()` and when data initialized for vectors changes.
243248

@@ -271,7 +276,8 @@ FROM vector_quantize_scan('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.
271276

272277
**Description:**
273278
These streaming interfaces provide the same functionality as `vector_full_scan` and `vector_quantize_scan`, respectively, but are designed for incremental or filtered processing of results.
274-
Unlike their non-streaming counterparts, these functions **omit the fourth parameter (`k`)** and allow you to use standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count.
279+
280+
Unlike their non-streaming counterparts, these functions **omit the fourth parameter (`k`)** and allow you to use standard SQL clauses such as `WHERE` and `LIMIT` to control filtering and result count. Since this interface only returns rowid and distance, if you need to access additional columns from the original table, you must use a SELF JOIN.
275281

276282
This makes them ideal for combining vector search with additional query conditions or progressive result consumption in streaming applications.
277283

@@ -306,6 +312,20 @@ WHERE score > 0.8
306312
LIMIT 10;
307313
```
308314

315+
**Accessing Additional Columns:**
316+
317+
```sql
318+
-- Perform a filtered full scan with additional columns
319+
SELECT
320+
v.rowid AS sentence_id,
321+
row_number() OVER (ORDER BY v.distance) AS rank_number,
322+
v.distance
323+
FROM vector_full_scan_stream('documents', 'embedding', vector_as_f32('[0.1, 0.2, 0.3]')) AS v
324+
JOIN sentences ON sentences.rowid = v.rowid
325+
WHERE sentences.chunk_id = 297
326+
LIMIT 3;
327+
```
328+
309329
**Usage Notes:**
310330

311331
* These interfaces return rows progressively and can efficiently combine vector similarity with SQL-level filters.

0 commit comments

Comments
 (0)