Skip to content

Commit 23869b8

Browse files
authored
docs(lancedb): add documentation for lancedb (#1039)
1 parent 477bdf8 commit 23869b8

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

docs/docs/ops/targets.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,84 @@ The spec takes the following fields:
9898

9999
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
100100

101+
### LanceDB
102+
103+
Exports data to a [LanceDB](https://lancedb.github.io/lancedb/) table.
104+
105+
#### Data Mapping
106+
107+
Here's how CocoIndex data elements map to LanceDB elements during export:
108+
109+
| CocoIndex Element | LanceDB Element |
110+
|-------------------|-----------------|
111+
| an export target | a unique table |
112+
| a collected row | a row |
113+
| a field | a column |
114+
115+
116+
::::info Installation and import
117+
118+
This target is provided via an optional dependency `[lancedb]`:
119+
120+
```sh
121+
pip install "cocoindex[lancedb]"
122+
```
123+
124+
To use it, you need to import the submodule `cocoindex.targets.lancedb`:
125+
126+
```python
127+
import cocoindex.targets.lancedb as coco_lancedb
128+
```
129+
130+
::::
131+
132+
#### Spec
133+
134+
The spec `coco_lancedb.LanceDB` takes the following fields:
135+
136+
* `db_uri` (`str`, required): The LanceDB database location (e.g. `./lancedb_data`).
137+
* `table_name` (`str`, required): The name of the table to export the data to.
138+
* `db_options` (`coco_lancedb.DatabaseOptions`, optional): Advanced database options.
139+
* `storage_options` (`dict[str, Any]`, optional): Passed through to LanceDB when connecting.
140+
141+
Additional notes:
142+
143+
* Exactly one primary key field is required for LanceDB targets. We create B-Tree index on this key column.
144+
145+
:::info
146+
147+
LanceDB has a limitation that it cannot build a vector index on an empty table (see [LanceDB issue #4034](https://github.com/lancedb/lance/issues/4034)).
148+
If you want to use vector indexes, you can run the flow once to populate the target table with data, and then create the vector indexes.
149+
150+
:::
151+
152+
You can find an end-to-end example here: [examples/text_embedding_lancedb](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_lancedb).
153+
154+
#### `connect_async()` helper
155+
156+
We provide a helper to obtain a shared `AsyncConnection` that is reused across your process and shared with CocoIndex's writer for strong read-after-write consistency:
157+
158+
```python
159+
from cocoindex.targets import lancedb as coco_lancedb
160+
161+
db = await coco_lancedb.connect_async("./lancedb_data")
162+
table = await db.open_table("TextEmbedding")
163+
```
164+
165+
Signature:
166+
167+
```python
168+
def connect_async(
169+
db_uri: str,
170+
*,
171+
db_options: coco_lancedb.DatabaseOptions | None = None,
172+
read_consistency_interval: datetime.timedelta | None = None
173+
) -> lancedb.AsyncConnection
174+
```
175+
176+
Once `db_uri` matches, it automatically reuses the same connection instance without re-establishing a new connection.
177+
This achieves strong consistency between your indexing and querying logic, if they run in the same process.
178+
101179
## Property Graph Targets
102180

103181
Property graph is a widely-adopted model for knowledge graphs, where both nodes and relationships can have properties.

0 commit comments

Comments
 (0)