You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/core/flow_def.mdx
+76-5Lines changed: 76 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -259,14 +259,11 @@ Export must happen at the top level of a flow, i.e. not within any child scopes
259
259
260
260
*`name`: the name to identify the export target.
261
261
*`target_spec`: the storage spec as the export target.
262
-
*`primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
263
-
*`vector_indexes` (`Sequence[VectorIndexDef]`, optional): the fields to create vector index. `VectorIndexDef` has the following fields:
264
-
*`field_name`: the field to create vector index.
265
-
*`metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
266
262
*`setup_by_user` (optional):
267
263
whether the export target is setup by user.
268
264
By default, CocoIndex is managing the target setup (surfaced by the `cocoindex setup` CLI subcommand), e.g. create related tables/collections/etc. with compatible schema, and update them upon change.
269
265
If `True`, the export target will be managed by users, and users are responsible for creating the target and updating it upon change.
266
+
* Fields to configure [storage indexes](#storage-indexes). `primary_key_fields` is required, and all others are optional.
The target storage is managed by CocoIndex, i.e. it'll be created by [CocoIndex CLI](/docs/core/cli) when you run `cocoindex setup`, and the data will be automatically updated (including stale data removal) when updating the index.
290
287
The `name` for the same storage should remain stable across different runs.
291
288
If it changes, CocoIndex will treat it as an old storage removed and a new one created, and perform setup changes and reindexing accordingly.
289
+
290
+
#### Storage Indexes
291
+
292
+
Many storage supports indexes, to boost efficiency in retrieving data.
293
+
CocoIndex provides a common way to configure indexes for various storages.
294
+
295
+
**Primary key*. `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
296
+
**Vector index*. `vector_indexes` (`Sequence[VectorIndexDef]`): the fields to create vector index. `VectorIndexDef` has the following fields:
297
+
*`field_name`: the field to create vector index.
298
+
*`metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
299
+
300
+
301
+
## Miscellaneous
302
+
303
+
### Auth Registry
304
+
305
+
CocoIndex manages an auth registry. It's an in-memory key-value store, mainly to store authentication information for a backend.
306
+
307
+
Operation spec is the default way to configure a backend. But it has the following limitations:
308
+
309
+
* The spec isn't supposed to contain secret information, and it's frequently shown in various places, e.g. `cocoindex show`.
310
+
* Once an operation is removed after flow definition code change, the spec is also gone.
311
+
But we still need to be able to drop the backend (e.g. a table) by `cocoindex setup` or `cocoindex drop`.
312
+
313
+
314
+
Auth registry is introduced to solve the problems above. It works as follows:
315
+
316
+
* You can create new **auth entry** by a key and a value.
317
+
* You can references the entry by the key, and pass it as part of spec for certain operations. e.g. `Neo4jRelationship` takes `connection` field in the form of auth entry reference.
318
+
319
+
<Tabs>
320
+
<TabItemvalue="python"label="Python"default>
321
+
322
+
You can add an auth entry by `cocoindex.add_auth_entry()` function, which returns a `cocoindex.AuthEntryReference`:
323
+
324
+
```python
325
+
my_graph_conn = cocoindex.add_auth_entry(
326
+
"my_graph_conn",
327
+
cocoindex.storages.Neo4jConnectionSpec(
328
+
uri="bolt://localhost:7687",
329
+
user="neo4j",
330
+
password="cocoindex",
331
+
))
332
+
```
333
+
334
+
Then reference it when building a spec that takes an auth entry:
335
+
336
+
* You can either reference by the `AuthEntryReference` object directly:
Copy file name to clipboardExpand all lines: docs/docs/ops/storages.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,3 +45,39 @@ doc_embeddings.export(
45
45
```
46
46
47
47
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
48
+
49
+
## Neo4j
50
+
51
+
### Setup
52
+
53
+
If you don't have a Postgres database, you can start a Postgres SQL database for cocoindex using our docker compose config:
54
+
55
+
```bash
56
+
docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/neo4j.yaml) up -d
57
+
```
58
+
59
+
### Neo4jRelationship
60
+
61
+
The `Neo4jRelationship` storage exports each row as a relationship to Neo4j Knowledge Graph.
62
+
When you collect rows for `Neo4jRelationship`, fields will be mapped to a relationship and source/target nodes for the relationship:
63
+
64
+
* You can explicitly specify fields mapped to source/target nodes.
65
+
* All remaining fields will be mapped to relationship properties by default.
66
+
67
+
68
+
The spec takes the following fields:
69
+
70
+
*`connection` (type: [auth reference](../core/flow_def#auth-registry) to `Neo4jConnectionSpec`): The connection to the Neo4j database. `Neo4jConnectionSpec` has the following fields:
71
+
*`uri` (type: `str`): The URI of the Neo4j database to use as the internal storage, e.g. `bolt://localhost:7687`.
72
+
*`user` (type: `str`): Username for the Neo4j database.
73
+
*`password` (type: `str`): Password for the Neo4j database.
74
+
*`db` (type: `str`, optional): The name of the Neo4j database to use as the internal storage, e.g. `neo4j`.
75
+
*`rel_type` (type: `str`): The type of the relationship.
76
+
*`source`/`target` (type: `Neo4jRelationshipEndSpec`): The source/target node of the relationship, with the following fields:
77
+
*`label` (type: `str`): The label of the node.
78
+
*`fields` (type: `list[Neo4jFieldMapping]`): Map fields from the collector to nodes in Neo4j, with the following fields:
79
+
*`field_name` (type: `str`): The name of the field in the collected row.
80
+
*`node_field_name` (type: `str`, optional): The name of the field to use as the node field. If unspecified, will use the same as `field_name`.
81
+
*`nodes` (type: `dict[str, Neo4jRelationshipNodeSpec]`): This configures indexes for different node labels. Key is the node label. The value `Neo4jRelationshipNodeSpec` has the following fields to configure [storage indexes](../core/flow_def#storage-indexes) for the node.
82
+
* `primary_key_fields` is required.
83
+
* `vector_indexes` is also supported and optional.
0 commit comments