cocoindex-io · badmonster0 · Apr 14, 2025 · Apr 14, 2025
diff --git a/docs/docs/core/flow_def.mdx b/docs/docs/core/flow_def.mdx
@@ -259,14 +259,11 @@ Export must happen at the top level of a flow, i.e. not within any child scopes
 
 *   `name`: the name to identify the export target.
 *   `target_spec`: the storage spec as the export target.
-*   `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
-*   `vector_indexes` (`Sequence[VectorIndexDef]`, optional): the fields to create vector index. `VectorIndexDef` has the following fields:
-    *   `field_name`: the field to create vector index.
-    *   `metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
 *   `setup_by_user` (optional):
      whether the export target is setup by user.
      By default, CocoIndex is managing the target setup (surfaced by the `cocoindex setup` CLI subcommand), e.g. create related tables/collections/etc. with compatible schema, and update them upon change.
      If `True`, the export target will be managed by users, and users are responsible for creating the target and updating it upon change.
+*   Fields to configure [storage indexes](#storage-indexes). `primary_key_fields` is required, and all others are optional.
 
 <Tabs>
 <TabItem value="python" label="Python" default>
@@ -280,7 +277,7 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
     demo_collector.export(
         "demo_storage", DemoStorageSpec(...),
         primary_key_fields=["field1"],
-        vector_index=[("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
+        vector_indexes=[cocoindex.VectorIndexDef("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
 ```
 
 </TabItem>
@@ -289,3 +286,77 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
 The target storage is managed by CocoIndex, i.e. it'll be created by [CocoIndex CLI](/docs/core/cli) when you run `cocoindex setup`, and the data will be automatically updated (including stale data removal) when updating the index.
 The `name` for the same storage should remain stable across different runs.
 If it changes, CocoIndex will treat it as an old storage removed and a new one created, and perform setup changes and reindexing accordingly.
+
+#### Storage Indexes
+
+Many storage supports indexes, to boost efficiency in retrieving data.
+CocoIndex provides a common way to configure indexes for various storages.
+
+*   *Primary key*. `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
+*   *Vector index*. `vector_indexes` (`Sequence[VectorIndexDef]`): the fields to create vector index. `VectorIndexDef` has the following fields:
+    *   `field_name`: the field to create vector index.
+    *   `metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
+
+
+## Miscellaneous
+
+### Auth Registry
+
+CocoIndex manages an auth registry. It's an in-memory key-value store, mainly to store authentication information for a backend.
+
+Operation spec is the default way to configure a backend. But it has the following limitations:
+
+*   The spec isn't supposed to contain secret information, and it's frequently shown in various places, e.g. `cocoindex show`.
+*   Once an operation is removed after flow definition code change, the spec is also gone.
+    But we still need to be able to drop the backend (e.g. a table) by `cocoindex setup` or `cocoindex drop`.
+
+
+Auth registry is introduced to solve the problems above. It works as follows:
+
+*   You can create new **auth entry** by a key and a value.
+*   You can references the entry by the key, and pass it as part of spec for certain operations. e.g. `Neo4jRelationship` takes `connection` field in the form of auth entry reference.
+
+<Tabs>
+<TabItem value="python" label="Python" default>
+
+You can add an auth entry by `cocoindex.add_auth_entry()` function, which returns a `cocoindex.AuthEntryReference`:
+
+```python
+my_graph_conn = cocoindex.add_auth_entry(
+    "my_graph_conn",
+    cocoindex.storages.Neo4jConnectionSpec(
+            uri="bolt://localhost:7687",
+            user="neo4j",
+            password="cocoindex",
+    ))
+```
+
+Then reference it when building a spec that takes an auth entry:
+
+*   You can either reference by the `AuthEntryReference` object directly:
+
+    ```python
+    demo_collector.export(
+        "MyGraph",
+        cocoindex.storages.Neo4jRelationship(connection=my_graph_conn, ...)
+    )
+    ```
+
+*   You can also reference it by the key string, using `cocoindex.ref_auth_entry()` function:
+
+    ```python
+    demo_collector.export(
+        "MyGraph",
+        cocoindex.storages.Neo4jRelationship(connection=cocoindex.ref_auth_entry("my_graph_conn"), ...))
+    ```
+
+</TabItem>
+</Tabs>
+
+Note that CocoIndex backends use the key of an auth entry to identify the backend.
+
+*   Keep the key stable.
+    If the key doesn't change, it's considered to be the same backend (even if the underlying way to connect/authenticate change).
+
+*   If a key is no longer referenced in any operation spec, keep it until the next `cocoindex setup` or `cocoindex drop`,
+    so that when cocoindex will be able to perform cleanups.
diff --git a/docs/docs/ops/storages.md b/docs/docs/ops/storages.md
@@ -45,3 +45,39 @@ doc_embeddings.export(
 ```
 
 You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
+
+## Neo4j
+
+### Setup
+
+If you don't have a Postgres database, you can start a Postgres SQL database for cocoindex using our docker compose config:
+
+```bash
+docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/neo4j.yaml) up -d
+```
+
+### Neo4jRelationship
+
+The `Neo4jRelationship` storage exports each row as a relationship to Neo4j Knowledge Graph.
+When you collect rows for `Neo4jRelationship`, fields will be mapped to a relationship and source/target nodes for the relationship:
+
+*   You can explicitly specify fields mapped to source/target nodes.
+*   All remaining fields will be mapped to relationship properties by default.
+
+
+The spec takes the following fields:
+
+*   `connection` (type: [auth reference](../core/flow_def#auth-registry) to `Neo4jConnectionSpec`): The connection to the Neo4j database. `Neo4jConnectionSpec` has the following fields:
+    *   `uri` (type: `str`): The URI of the Neo4j database to use as the internal storage, e.g. `bolt://localhost:7687`.
+    *   `user` (type: `str`): Username for the Neo4j database.
+    *   `password` (type: `str`): Password for the Neo4j database.
+    *   `db` (type: `str`, optional): The name of the Neo4j database to use as the internal storage, e.g. `neo4j`.
+*   `rel_type` (type: `str`): The type of the relationship.
+*   `source`/`target` (type: `Neo4jRelationshipEndSpec`): The source/target node of the relationship, with the following fields:
+    *   `label` (type: `str`): The label of the node.
+    *   `fields` (type: `list[Neo4jFieldMapping]`): Map fields from the collector to nodes in Neo4j, with the following fields:
+        *   `field_name` (type: `str`): The name of the field in the collected row.
+        *   `node_field_name` (type: `str`, optional): The name of the field to use as the node field. If unspecified, will use the same as `field_name`.
+*   `nodes` (type: `dict[str, Neo4jRelationshipNodeSpec]`): This configures indexes for different node labels. Key is the node label. The value `Neo4jRelationshipNodeSpec` has the following fields to configure [storage indexes](../core/flow_def#storage-indexes) for the node.
+        *   `primary_key_fields` is required.
+        *   `vector_indexes` is also supported and optional.