Skip to content

Commit 44f6449

Browse files
committed
update routes
1 parent ffdaadc commit 44f6449

File tree

8 files changed

+354
-359
lines changed

8 files changed

+354
-359
lines changed

docs/docs/targets/index.md

Lines changed: 320 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,342 @@ title: Targets
33
description: CocoIndex Built-in Targets
44
toc_max_heading_level: 4
55
---
6+
import { ExampleButton } from '../../src/components/GitHubButton';
67

78
# CocoIndex Built-in Targets
89

910
For each target, data are exported from a data collector, containing data of multiple entries, each with multiple fields.
1011
The way to map data from a data collector to a target depends on data model of the target.
1112

13+
14+
| Target | Documentation | Type |
15+
|----------|---------------|-------------------------|
16+
| Postgres | [Postgres](./targets/entry-oriented/postgres) | Entry-oriented |
17+
| Qdrant | [Qdrant](./targets/entry-oriented/qdrant) | Entry-oriented |
18+
| LanceDB | [LanceDB](./targets/entry-oriented/lancedb) | Entry-oriented |
19+
| Neo4j | [Neo4j](./targets/property-graph/neo4j) | Property graph |
20+
| Kuzu | [Kuzu](./targets/property-graph/kuzu) | Property graph |
21+
22+
1223
## Entry-Oriented Targets
1324

1425
An entry-oriented target organizes data into independent entries, such as rows, key-value pairs, or documents.
1526
Each entry is self-contained and does not explicitly link to others.
1627
There is usually a straightforward mapping from data collector rows to entries.
1728

18-
| Target | Link |
19-
|----------|------|
20-
| Postgres | [Postgres](./targets/entry-oriented/postgres) |
21-
| Qdrant | [Qdrant](./targets/entry-oriented/qdrant) |
22-
| LanceDB | [LanceDB](./targets/entry-oriented/lancedb) |
23-
2429

2530
## Property Graph Targets
26-
2731
Property graph is a widely-adopted model for knowledge graphs, where both nodes and relationships can have properties.
32+
2833
[Graph database concepts](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/) has a good introduction to basic concepts of property graphs.
2934

35+
The following concepts will be used in the following sections:
36+
* [Node](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/#graphdb-node)
37+
* [Node label](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/#graphdb-labels), which represents a type of nodes.
38+
* [Relationship](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/#graphdb-relationship), which describes a connection between two nodes.
39+
* [Relationship type](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/#graphdb-relationship-type)
40+
* [Properties](https://neo4j.com/docs/getting-started/appendix/graphdb-concepts/#graphdb-properties), which are key-value pairs associated with nodes and relationships.
41+
42+
### Data Mapping
43+
44+
Data from collectors are mapped to graph elements in various types:
45+
46+
1. Rows from collectors → Nodes in the graph
47+
2. Rows from collectors → Relationships in the graph (including source and target nodes of the relationship)
48+
49+
This is what you need to provide to define these mappings:
50+
51+
* Specify [nodes to export](#nodes-to-export).
52+
* [Declare extra node labels](#declare-extra-node-labels), for labels to appear as source/target nodes of relationships but not exported as nodes.
53+
* Specify [relationships to export](#relationships-to-export).
54+
55+
In addition, the same node may appear multiple times, from exported nodes and various relationships.
56+
They should appear as the same node in the target graph database.
57+
CocoIndex automatically [matches and deduplicates nodes](#nodes-matching-and-deduplicating) based on their primary key values.
58+
59+
### Nodes to Export
60+
61+
Here's how CocoIndex data elements map to nodes in the graph:
62+
63+
| CocoIndex Element | Graph Element |
64+
|-------------------|------------------|
65+
| an export target | nodes with a unique label |
66+
| a collected row | a node |
67+
| a field | a property of node |
68+
69+
Note that the label used in different `Nodes`s should be unique.
70+
71+
`cocoindex.targets.Nodes` is to describe mapping to nodes. It has the following fields:
72+
73+
* `label` (`str`): The label of the node.
74+
75+
For example, consider we have collected the following rows:
76+
77+
<small>
78+
79+
| filename | summary |
80+
|----------|---------|
81+
| chapter1.md | At the beginning, ... |
82+
| chapter2.md | In the second day, ... |
83+
84+
</small>
85+
86+
We can export them to nodes under label `Document` like this:
87+
88+
```python
89+
document_collector.export(
90+
...
91+
cocoindex.targets.Neo4j(
92+
...
93+
mapping=cocoindex.targets.Nodes(label="Document"),
94+
),
95+
primary_key_fields=["filename"],
96+
)
97+
```
98+
99+
The collected rows will be mapped to nodes in knowledge database like this:
100+
101+
```mermaid
102+
graph TD
103+
Doc_Chapter1@{
104+
shape: rounded
105+
label: "**[Document]**
106+
**filename\\*: chapter1.md**
107+
summary: At the beginning, ..."
108+
classDef: node
109+
}
110+
111+
Doc_Chapter2@{
112+
shape: rounded
113+
label: "**[Document]**
114+
**filename\\*: chapter2.md**
115+
summary: In the second day, ..."
116+
classDef: node
117+
}
118+
119+
classDef node font-size:8pt,text-align:left,stroke-width:2;
120+
```
121+
122+
### Declare Extra Node Labels
123+
124+
If a node label needs to appear as source or target of a relationship, but not exported as a node, you need to [declare](../core/flow_def#target-declarations) the label with necessary configuration.
125+
126+
The dataclass to describe the declaration is specific to each target (e.g. `cocoindex.targets.Neo4jDeclarations`),
127+
while they share the following common fields:
128+
129+
* `nodes_label` (required): The label of the node.
130+
* Options for [storage indexes](../core/flow_def#storage-indexes).
131+
* `primary_key_fields` (required)
132+
* `vector_indexes` (optional)
133+
134+
Continuing the same example above.
135+
Considering we want to extract relationships from `Document` to `Place` later (i.e. a document mentions a place), but the `Place` label isn't exported as a node, we need to declare it:
136+
137+
```python
138+
flow_builder.declare(
139+
cocoindex.targets.Neo4jDeclarations(
140+
connection = ...,
141+
nodes_label="Place",
142+
primary_key_fields=["name"],
143+
),
144+
)
145+
```
146+
147+
### Relationships to Export
148+
149+
Here's how CocoIndex data elements map to relationships in the graph:
150+
151+
| CocoIndex Element | Graph Element |
152+
|-------------------|------------------|
153+
| an export target | relationships with a unique type |
154+
| a collected row | a relationship |
155+
| a field | a property of relationship, or a property of source/target node, based on configuration |
156+
157+
Note that the type used in different `Relationships`s should be unique.
158+
159+
`cocoindex.targets.Relationships` is to describe mapping to relationships. It has the following fields:
160+
161+
* `rel_type` (`str`): The type of the relationship.
162+
* `source`/`target` (`cocoindex.targets.NodeFromFields`): Specify how to extract source/target node information from specific fields in the collected row. It has the following fields:
163+
* `label` (`str`): The label of the node.
164+
* `fields` (`Sequence[cocoindex.targets.TargetFieldMapping]`): Specify field mappings from the collected rows to node properties, with the following fields:
165+
* `source` (`str`): The name of the field in the collected row.
166+
* `target` (`str`, optional): The name of the field to use as the node field. If unspecified, will use the same as `source`.
167+
168+
:::note Map necessary fields for nodes of relationships
169+
170+
You need to map the following fields for nodes of each relationship:
171+
172+
* Make sure all primary key fields for the label are mapped.
173+
* Optionally, you can also map non-key fields. If you do so, please make sure all value fields are mapped.
174+
175+
:::
176+
177+
All fields in the collector that are not used in mappings for source or target node fields will be mapped to relationship properties.
178+
179+
For example, consider we have collected the following rows, to describe places mentioned in each file, along with embeddings of the places:
180+
181+
<small>
182+
183+
| doc_filename | place_name | place_embedding | location |
184+
|----------|-------|-----------------|-----------------|
185+
| chapter1.md | Crystal Palace | [0.1, 0.5, ...] | 12 |
186+
| chapter2.md | Magic Forest | [0.4, 0.2, ...] | 23 |
187+
| chapter2.md | Crystal Palace | [0.1, 0.5, ...] | 56 |
188+
189+
</small>
190+
191+
We can export them to relationships under type `MENTION` like this:
192+
193+
```python
194+
doc_place_collector.export(
195+
...
196+
cocoindex.targets.Neo4j(
197+
...
198+
mapping=cocoindex.targets.Relationships(
199+
rel_type="MENTION",
200+
source=cocoindex.targets.NodeFromFields(
201+
label="Document",
202+
fields=[cocoindex.targets.TargetFieldMapping(source="doc_filename", target="filename")],
203+
),
204+
target=cocoindex.targets.NodeFromFields(
205+
label="Place",
206+
fields=[
207+
cocoindex.targets.TargetFieldMapping(source="place_name", target="name"),
208+
cocoindex.targets.TargetFieldMapping(source="place_embedding", target="embedding"),
209+
],
210+
),
211+
),
212+
),
213+
...
214+
)
215+
```
216+
217+
The `doc_filename` field is mapped to `Document.filename` property for the source node, while `place_name` and `place_embedding` are mapped to `Place.name` and `Place.embedding` properties for the target node.
218+
The remaining field `location` becomes a property of the relationship.
219+
For the data above, we get a bunch of relationships like this:
220+
221+
```mermaid
222+
graph TD
223+
Doc_Chapter1@{
224+
shape: rounded
225+
label: "**[Document]**
226+
**filename\\*: chapter1.md**"
227+
classDef: nodeRef
228+
}
229+
230+
Doc_Chapter2_a@{
231+
shape: rounded
232+
label: "**[Document]**
233+
**filename\\*: chapter2.md**"
234+
classDef: nodeRef
235+
}
236+
237+
Doc_Chapter2_b@{
238+
shape: rounded
239+
label: "**[Document]**
240+
**filename\\*: chapter2.md**"
241+
classDef: nodeRef
242+
}
243+
244+
Place_CrystalPalace_a@{
245+
shape: rounded
246+
label: "**[Place]**
247+
**name\\*: Crystal Palace**
248+
embedding: [0.1, 0.5, ...]"
249+
classDef: node
250+
}
251+
252+
Place_MagicForest@{
253+
shape: rounded
254+
label: "**[Place]**
255+
**name\\*: Magic Forest**
256+
embedding: [0.4, 0.2, ...]"
257+
classDef: node
258+
}
259+
260+
Place_CrystalPalace_b@{
261+
shape: rounded
262+
label: "**[Place]**
263+
**name\\*: Crystal Palace**
264+
embedding: [0.1, 0.5, ...]"
265+
classDef: node
266+
}
267+
268+
269+
Doc_Chapter1:::nodeRef -- **:MENTION** (location:12) --> Place_CrystalPalace_a:::node
270+
Doc_Chapter2_a:::nodeRef -- **:MENTION** (location:23) --> Place_MagicForest:::node
271+
Doc_Chapter2_b:::nodeRef -- **:MENTION** (location:56) --> Place_CrystalPalace_b:::node
272+
273+
classDef nodeRef font-size:8pt,text-align:left,fill:transparent,stroke-width:1,stroke-dasharray:5 5;
274+
classDef node font-size:8pt,text-align:left,stroke-width:2;
275+
276+
```
277+
278+
### Nodes Matching and Deduplicating
279+
280+
The nodes and relationships we got above are discrete elements.
281+
To fit them into a connected property graph, CocoIndex will match and deduplicate nodes automatically:
282+
283+
* Match nodes based on their primary key values. Nodes with the same primary key values are considered as the same node.
284+
* For non-primary key fields (a.k.a. value fields), CocoIndex will pick the values from an arbitrary one.
285+
If multiple nodes (before deduplication) with the same primary key provide value fields, an arbitrary one will be picked.
286+
287+
:::note
288+
289+
The best practice is to make the value fields consistent across different appearances of the same node, to avoid non-determinism in the exported graph.
290+
291+
:::
292+
293+
After matching and deduplication, we get the final graph:
294+
295+
```mermaid
296+
graph TD
297+
Doc_Chapter1@{
298+
shape: rounded
299+
label: "**[Document]**
300+
**filename\\*: chapter1.md**
301+
summary: At the beginning, ..."
302+
classDef: node
303+
}
304+
305+
Doc_Chapter2@{
306+
shape: rounded
307+
label: "**[Document]**
308+
**filename\\*: chapter2.md**
309+
summary: In the second day, ..."
310+
classDef: node
311+
}
312+
313+
Place_CrystalPalace@{
314+
shape: rounded
315+
label: "**[Place]**
316+
**name\\*: Crystal Palace**
317+
embedding: [0.1, 0.5, ...]"
318+
classDef: node
319+
}
320+
321+
Place_MagicForest@{
322+
shape: rounded
323+
label: "**[Place]**
324+
**name\\*: Magic Forest**
325+
embedding: [0.4, 0.2, ...]"
326+
classDef: node
327+
}
328+
329+
Doc_Chapter1:::node -- **:MENTION** (location:12) --> Place_CrystalPalace:::node
330+
Doc_Chapter2:::node -- **:MENTION** (location:23) --> Place_MagicForest:::node
331+
Doc_Chapter2:::node -- **:MENTION** (location:56) --> Place_CrystalPalace:::node
332+
333+
classDef node font-size:8pt,text-align:left,stroke-width:2;
334+
```
335+
336+
### Examples
337+
338+
You can find end-to-end examples fitting into any of supported property graphs in the following directories:
339+
* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph" text="Docs to Knowledge Graph" margin="0 0 16px 0" />
340+
341+
* <ExampleButton href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/product_recommendation" text="Product Recommendation" margin="0 0 16px 0" />
342+
30343

31-
| Target | Link |
32-
|----------|------|
33-
| Neo4j | [Neo4j](./targets/property-graph/neo4j) |
34-
| Kuzu | [Kuzu](./targets/property-graph/kuzu) |
35344

docs/docs/targets/property-graph/kuzu.md renamed to docs/docs/targets/kuzu.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,23 @@ title: Kuzu
33
description: CocoIndex Kuzu Target
44
toc_max_heading_level: 4
55
---
6+
import { ExampleButton } from '../../src/components/GitHubButton';
7+
8+
# Kuzu
69

710
Exports data to a [Kuzu](https://kuzu.com/) graph database.
811

912
## Get Started
1013

11-
Read [Property Graph Targets](../property-graph/index.md) for more information to get started on how it works.
14+
Read [Property Graph Targets](./targets/index.md#property-graph-targets) for more information to get started on how it works in CocoIndex.
1215

1316
## Spec
1417

1518
CocoIndex supports talking to Kuzu through its [API server](https://github.com/kuzudb/api-server).
1619

1720
The `Kuzu` target spec takes the following fields:
1821

19-
* `connection` ([auth reference](../../core/flow_def#auth-registry) to `KuzuConnectionSpec`): The connection to the Kuzu database. `KuzuConnectionSpec` has the following fields:
22+
* `connection` ([auth reference](../core/flow_def#auth-registry) to `KuzuConnectionSpec`): The connection to the Kuzu database. `KuzuConnectionSpec` has the following fields:
2023
* `api_server_url` (`str`): The URL of the Kuzu API server, e.g. `http://localhost:8123`.
2124
* `mapping` (`Nodes | Relationships`): The mapping from collected row to nodes or relationships of the graph. For either [nodes to export](#nodes-to-export) or [relationships to export](#relationships-to-export).
2225

@@ -48,3 +51,10 @@ docker run -d --name kuzu-explorer -p ${KUZU_EXPLORER_PORT}:8000 -v ${KUZU_DB_D
4851
```
4952

5053
You can then access the explorer at [http://localhost:8124](http://localhost:8124).
54+
55+
## Example
56+
<ExampleButton
57+
href="https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph"
58+
text="Docs to Knowledge Graph"
59+
margin="16px 0 24px 0"
60+
/>

0 commit comments

Comments
 (0)