Skip to content

Commit 3427ad8

Browse files
committed
Add documents for Neo4j
1 parent 2f7b147 commit 3427ad8

File tree

2 files changed

+112
-5
lines changed

2 files changed

+112
-5
lines changed

docs/docs/core/flow_def.mdx

Lines changed: 76 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -259,14 +259,11 @@ Export must happen at the top level of a flow, i.e. not within any child scopes
259259

260260
* `name`: the name to identify the export target.
261261
* `target_spec`: the storage spec as the export target.
262-
* `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
263-
* `vector_indexes` (`Sequence[VectorIndexDef]`, optional): the fields to create vector index. `VectorIndexDef` has the following fields:
264-
* `field_name`: the field to create vector index.
265-
* `metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
266262
* `setup_by_user` (optional):
267263
whether the export target is setup by user.
268264
By default, CocoIndex is managing the target setup (surfaced by the `cocoindex setup` CLI subcommand), e.g. create related tables/collections/etc. with compatible schema, and update them upon change.
269265
If `True`, the export target will be managed by users, and users are responsible for creating the target and updating it upon change.
266+
* Fields to configure [storage indexes](#storage-indexes). `primary_key_fields` is required, and all others are optional.
270267

271268
<Tabs>
272269
<TabItem value="python" label="Python" default>
@@ -280,7 +277,7 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
280277
demo_collector.export(
281278
"demo_storage", DemoStorageSpec(...),
282279
primary_key_fields=["field1"],
283-
vector_index=[("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
280+
vector_indexes=[cocoindex.VectorIndexDef("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
284281
```
285282

286283
</TabItem>
@@ -289,3 +286,77 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
289286
The target storage is managed by CocoIndex, i.e. it'll be created by [CocoIndex CLI](/docs/core/cli) when you run `cocoindex setup`, and the data will be automatically updated (including stale data removal) when updating the index.
290287
The `name` for the same storage should remain stable across different runs.
291288
If it changes, CocoIndex will treat it as an old storage removed and a new one created, and perform setup changes and reindexing accordingly.
289+
290+
#### Storage Indexes
291+
292+
Many storage supports indexes, to boost efficiency in retrieving data.
293+
CocoIndex provides a common way to configure indexes for various storages.
294+
295+
* *Primary key*. `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
296+
* *Vector index*. `vector_indexes` (`Sequence[VectorIndexDef]`): the fields to create vector index. `VectorIndexDef` has the following fields:
297+
* `field_name`: the field to create vector index.
298+
* `metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
299+
300+
301+
## Miscellaneous
302+
303+
### Auth Registry
304+
305+
CocoIndex manages an auth registry. It's an in-memory key-value store, mainly to store authentication information for a backend.
306+
307+
Operation spec is the default way to configure a backend. But it has the following limitations:
308+
309+
* The spec isn't supposed to contain secret information, and it's frequently shown in various places, e.g. `cocoindex show`.
310+
* Once an operation is removed after flow definition code change, the spec is also gone.
311+
But we still need to be able to drop the backend (e.g. a table) by `cocoindex setup` or `cocoindex drop`.
312+
313+
314+
Auth registry is introduced to solve the problems above. It works as follows:
315+
316+
* You can create new **auth entry** by a key and a value.
317+
* You can references the entry by the key, and pass it as part of spec for certain operations. e.g. `Neo4jRelationship` takes `connection` field in the form of auth entry reference.
318+
319+
<Tabs>
320+
<TabItem value="python" label="Python" default>
321+
322+
You can add an auth entry by `cocoindex.add_auth_entry()` function, which returns a `cocoindex.AuthEntryReference`:
323+
324+
```python
325+
my_graph_conn = cocoindex.add_auth_entry(
326+
"my_graph_conn",
327+
cocoindex.storages.Neo4jConnectionSpec(
328+
uri="bolt://localhost:7687",
329+
user="neo4j",
330+
password="cocoindex",
331+
))
332+
```
333+
334+
Then reference it when building a spec that takes an auth entry:
335+
336+
* You can either reference by the `AuthEntryReference` object directly:
337+
338+
```python
339+
demo_collector.export(
340+
"MyGraph",
341+
cocoindex.storages.Neo4jRelationship(connection=my_graph_conn, ...)
342+
)
343+
```
344+
345+
* You can also reference it by the key string, using `cocoindex.ref_auth_entry()` function:
346+
347+
```python
348+
demo_collector.export(
349+
"MyGraph",
350+
cocoindex.storages.Neo4jRelationship(connection=cocoindex.ref_auth_entry("my_graph_conn"), ...))
351+
```
352+
353+
</TabItem>
354+
</Tabs>
355+
356+
Note that CocoIndex backends use the key of an auth entry to identify the backend.
357+
358+
* Keep the key stable.
359+
If the key doesn't change, it's considered to be the same backend (even if the underlying way to connect/authenticate change).
360+
361+
* If a key is no longer referenced in any operation spec, keep it until the next `cocoindex setup` or `cocoindex drop`,
362+
so that when cocoindex will be able to perform cleanups.

docs/docs/ops/storages.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,39 @@ doc_embeddings.export(
4545
```
4646

4747
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
48+
49+
## Neo4j
50+
51+
### Setup
52+
53+
If you don't have a Postgres database, you can start a Postgres SQL database for cocoindex using our docker compose config:
54+
55+
```bash
56+
docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/neo4j.yaml) up -d
57+
```
58+
59+
### Neo4jRelationship
60+
61+
The `Neo4jRelationship` storage exports each row as a relationship to Neo4j Knowledge Graph.
62+
When you collect rows for `Neo4jRelationship`, fields will be mapped to a relationship and source/target nodes for the relationship:
63+
64+
* You can explicitly specify fields mapped to source/target nodes.
65+
* All remaining fields will be mapped to relationship properties by default.
66+
67+
68+
The spec takes the following fields:
69+
70+
* `connection` (type: [auth reference](../core/flow_def#auth-registry) to `Neo4jConnectionSpec`): The connection to the Neo4j database. `Neo4jConnectionSpec` has the following fields:
71+
* `uri` (type: `str`): The URI of the Neo4j database to use as the internal storage, e.g. `bolt://localhost:7687`.
72+
* `user` (type: `str`): Username for the Neo4j database.
73+
* `password` (type: `str`): Password for the Neo4j database.
74+
* `db` (type: `str`, optional): The name of the Neo4j database to use as the internal storage, e.g. `neo4j`.
75+
* `rel_type` (type: `str`): The type of the relationship.
76+
* `source`/`target` (type: `Neo4jRelationshipEndSpec`): The source/target node of the relationship, with the following fields:
77+
* `label` (type: `str`): The label of the node.
78+
* `fields` (type: `list[Neo4jFieldMapping]`): Map fields from the collector to nodes in Neo4j, with the following fields:
79+
* `field_name` (type: `str`): The name of the field in the collected row.
80+
* `node_field_name` (type: `str`, optional): The name of the field to use as the node field. If unspecified, will use the same as `field_name`.
81+
* `nodes` (type: `dict[str, Neo4jRelationshipNodeSpec]`): This configures indexes for different node labels. Key is the node label. The value `Neo4jRelationshipNodeSpec` has the following fields to configure [storage indexes](../core/flow_def#storage-indexes) for the node.
82+
* `primary_key_fields` is required.
83+
* `vector_indexes` is also supported and optional.

0 commit comments

Comments
 (0)