Skip to content

Commit 2280801

Browse files
committed
organize targets
1 parent 661f8d7 commit 2280801

File tree

9 files changed

+682
-1
lines changed

9 files changed

+682
-1
lines changed
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
title: LanceDB
3+
description: CocoIndex LanceDB Target
4+
toc_max_heading_level: 4
5+
---
6+
7+
import { ExampleButton } from '../../../src/components/GitHubButton';
8+
9+
# LanceDB
10+
11+
Exports data to a [LanceDB](https://lancedb.github.io/lancedb/) table.
12+
13+
## Data Mapping
14+
15+
Here's how CocoIndex data elements map to LanceDB elements during export:
16+
17+
| CocoIndex Element | LanceDB Element |
18+
|-------------------|-----------------|
19+
| an export target | a unique table |
20+
| a collected row | a row |
21+
| a field | a column |
22+
23+
24+
::::info Installation and import
25+
26+
This target is provided via an optional dependency `[lancedb]`:
27+
28+
```sh
29+
pip install "cocoindex[lancedb]"
30+
```
31+
32+
To use it, you need to import the submodule `cocoindex.targets.lancedb`:
33+
34+
```python
35+
import cocoindex.targets.lancedb as coco_lancedb
36+
```
37+
38+
::::
39+
40+
## Spec
41+
42+
The spec `coco_lancedb.LanceDB` takes the following fields:
43+
44+
* `db_uri` (`str`, required): The LanceDB database location (e.g. `./lancedb_data`).
45+
* `table_name` (`str`, required): The name of the table to export the data to.
46+
* `db_options` (`coco_lancedb.DatabaseOptions`, optional): Advanced database options.
47+
* `storage_options` (`dict[str, Any]`, optional): Passed through to LanceDB when connecting.
48+
49+
Additional notes:
50+
51+
* Exactly one primary key field is required for LanceDB targets. We create B-Tree index on this key column.
52+
53+
:::info
54+
55+
LanceDB has a limitation that it cannot build a vector index on an empty table (see [LanceDB issue #4034](https://github.com/lancedb/lance/issues/4034)).
56+
If you want to use vector indexes, you can run the flow once to populate the target table with data, and then create the vector indexes.
57+
58+
:::
59+
60+
You can find an end-to-end example here: [examples/text_embedding_lancedb](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_lancedb).
61+
62+
## `connect_async()` helper
63+
64+
We provide a helper to obtain a shared `AsyncConnection` that is reused across your process and shared with CocoIndex's writer for strong read-after-write consistency:
65+
66+
```python
67+
from cocoindex.targets import lancedb as coco_lancedb
68+
69+
db = await coco_lancedb.connect_async("./lancedb_data")
70+
table = await db.open_table("TextEmbedding")
71+
```
72+
73+
Signature:
74+
75+
```python
76+
def connect_async(
77+
db_uri: str,
78+
*,
79+
db_options: coco_lancedb.DatabaseOptions | None = None,
80+
read_consistency_interval: datetime.timedelta | None = None
81+
) -> lancedb.AsyncConnection
82+
```
83+
84+
Once `db_uri` matches, it automatically reuses the same connection instance without re-establishing a new connection.
85+
This achieves strong consistency between your indexing and querying logic, if they run in the same process.
86+
87+
## Example
88+
<ExampleButton
89+
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_lancedb"
90+
text="Text Embedding LanceDB Example"
91+
margin="16px 0 24px 0"
92+
/>
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Postgres
3+
description: CocoIndex Postgres Target
4+
toc_max_heading_level: 4
5+
---
6+
7+
import { ExampleButton } from '../../../src/components/GitHubButton';
8+
9+
# Postgres
10+
11+
Exports data to Postgres database (with pgvector extension).
12+
13+
## Data Mapping
14+
15+
Here's how CocoIndex data elements map to Postgres elements during export:
16+
17+
| CocoIndex Element | Postgres Element |
18+
|-------------------|------------------|
19+
| an export target | a unique table |
20+
| a collected row | a row |
21+
| a field | a column |
22+
23+
For example, if you have a data collector that collects rows with fields `id`, `title`, and `embedding`, it will be exported to a Postgres table with corresponding columns.
24+
It should be a unique table, meaning that no other export target should export to the same table.
25+
26+
:::warning vector type mapping to Postgres
27+
28+
Since vectors in pgvector must have fixed dimension, we only map vectors of number types with fixed dimension (i.e. *Vector[cocoindex.Float32, N]*, *Vector[cocoindex.Float64, N]*, and *Vector[cocoindex.Int64, N]*) to `vector(N)` columns.
29+
For all other vector types, we map them to `jsonb` columns.
30+
31+
:::
32+
33+
:::info U+0000 (NUL) characters in strings
34+
35+
U+0000 (NUL) is a valid character in Unicode, but Postgres has a limitation that strings (including `text`-like types and strings in `jsonb`) cannot contain them.
36+
CocoIndex automatically strips U+0000 (NUL) characters from strings before exporting to Postgres. For example, if you have a string `"Hello\0World"`, it will be exported as `"HelloWorld"`.
37+
38+
:::
39+
40+
## Spec
41+
42+
The spec takes the following fields:
43+
44+
* `database` ([auth reference](../core/flow_def#auth-registry) to `DatabaseConnectionSpec`, optional): The connection to the Postgres database.
45+
See [DatabaseConnectionSpec](../core/settings#databaseconnectionspec) for its specific fields.
46+
If not provided, will use the same database as the [internal storage](/docs/core/basics#internal-storage).
47+
48+
* `table_name` (`str`, optional): The name of the table to store to. If unspecified, will use the table name `[${AppNamespace}__]${FlowName}__${TargetName}`, e.g. `DemoFlow__doc_embeddings` or `Staging__DemoFlow__doc_embeddings`.
49+
50+
## Example
51+
<ExampleButton
52+
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding"
53+
text="Text Embedding Example with Postgres"
54+
margin="16px 0 24px 0"
55+
/>
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Qdrant
3+
description: CocoIndex Qdrant Target
4+
toc_max_heading_level: 4
5+
---
6+
7+
import { ExampleButton } from '../../../src/components/GitHubButton';
8+
9+
# Qdrant
10+
11+
Exports data to a [Qdrant](https://qdrant.tech/) collection.
12+
13+
## Data Mapping
14+
15+
Here's how CocoIndex data elements map to Qdrant elements during export:
16+
17+
| CocoIndex Element | Qdrant Element |
18+
|-------------------|------------------|
19+
| an export target | a unique collection |
20+
| a collected row | a point |
21+
| a field | a named vector, if fits into Qdrant vector; or a field within payload otherwise |
22+
23+
The following vector types fit into Qdrant vector:
24+
* One-dimensional vectors with fixed dimension, e.g. *Vector[Float32, N]*, *Vector[Float64, N]* and *Vector[Int64, N]*.
25+
We map them to [dense vectors](https://qdrant.tech/documentation/concepts/vectors/#dense-vectors).
26+
* Two-dimensional vectors whose inner layer is a one-dimensional vector with fixed dimension, e.g. *Vector[Vector[Float32, N]]*, *Vector[Vector[Int64, N]]*, *Vector[Vector[Float64, N]]*. The outer layer may or may not have a fixed dimension.
27+
We map them to [multivectors](https://qdrant.tech/documentation/concepts/vectors/#multivectors).
28+
29+
30+
:::warning vector type mapping to Qdrant
31+
32+
Since vectors in Qdrant must have fixed dimension, we only map vectors of number types with fixed dimension to Qdrant vectors.
33+
For all other vector types, we map to Qdrant payload as JSON arrays.
34+
35+
:::
36+
37+
## Spec
38+
39+
The spec takes the following fields:
40+
41+
* `connection` ([auth reference](../core/flow_def#auth-registry) to `QdrantConnection`, optional): The connection to the Qdrant instance. `QdrantConnection` has the following fields:
42+
* `grpc_url` (`str`): The [gRPC URL](https://qdrant.tech/documentation/interfaces/#grpc-interface) of the Qdrant instance, e.g. `http://localhost:6334/`.
43+
* `api_key` (`str`, optional). API key to authenticate requests with.
44+
45+
If `connection` is not provided, will use local Qdrant instance at `http://localhost:6334/` by default.
46+
47+
* `collection_name` (`str`, required): The name of the collection to export the data to.
48+
49+
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
50+
51+
## Example
52+
<ExampleButton
53+
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant"
54+
text="Text Embedding Qdrant Example"
55+
margin="16px 0 24px 0"
56+
/>

docs/docs/targets/index.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: Targets
3+
description: CocoIndex Built-in Targets
4+
toc_max_heading_level: 4
5+
---
6+
7+
# CocoIndex Built-in Targets
8+
9+
For each target, data are exported from a data collector, containing data of multiple entries, each with multiple fields.
10+
The way to map data from a data collector to a target depends on data model of the target.
11+
12+
## Entry-Oriented Targets
13+
14+
An entry-oriented target organizes data into independent entries, such as rows, key-value pairs, or documents.
15+
Each entry is self-contained and does not explicitly link to others.
16+
There is usually a straightforward mapping from data collector rows to entries.
17+
18+
| Target | Link |
19+
|----------|------|
20+
| Postgres | [Postgres](./targets/entry-oriented/postgres) |
21+
| Qdrant | [Qdrant](./targets/entry-oriented/qdrant) |
22+
| LanceDB | [LanceDB](./targets/entry-oriented/lancedb) |
23+
24+
25+
## Property Graph Targets
26+
27+
Property graph is a widely-adopted model for knowledge graphs, where both nodes and relationships can have properties.
28+
[Graph database concepts](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/) has a good introduction to basic concepts of property graphs.
29+
30+
31+
| Target | Link |
32+
|----------|------|
33+
| Neo4j | [Neo4j](./targets/property-graph/neo4j) |
34+
| Kuzu | [Kuzu](./targets/property-graph/kuzu) |
35+

0 commit comments

Comments
 (0)