Skip to content

Commit 301c77e

Browse files
authored
docs(kg): add example with diagrams for knowledge graph mapping (#385)
* docs(kg): add example with diagrams for knowledge graph mapping * fix: add missing period. * fix: add missing declarations
1 parent e91d832 commit 301c77e

File tree

4 files changed

+1130
-10
lines changed

4 files changed

+1130
-10
lines changed

docs/docs/ops/storages.md

Lines changed: 190 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,52 @@ Note that the label used in different `NodeMapping`s should be unique.
118118

119119
* `label` (type: `str`): The label of the node.
120120

121-
For example, if you have a data collector that collects rows with fields `id`, `name` and `gender`, it can be exported to a node with label `Person` and properties `id` `name` and `gender`.
121+
For example, consider we have collected the following rows:
122+
123+
<small>
124+
125+
| filename | summary |
126+
|----------|---------|
127+
| chapter1.md | At the beginning, ... |
128+
| chapter2.md | In the second day, ... |
129+
130+
</small>
131+
132+
We can export them to nodes under label `Document` like this:
133+
134+
```python
135+
document_collector.export(
136+
...
137+
cocoindex.storages.Neo4j(
138+
...
139+
mapping=cocoindex.storages.NodeMapping(label="Document"),
140+
),
141+
primary_key_fields=["filename"],
142+
)
143+
```
144+
145+
The collected rows will be mapped to nodes in knowledge database like this:
146+
147+
```mermaid
148+
graph TD
149+
Doc_Chapter1@{
150+
shape: rounded
151+
label: "**[Document]**
152+
**filename\\*: chapter1.md**
153+
summary: At the beginning, ..."
154+
classDef: node
155+
}
156+
157+
Doc_Chapter2@{
158+
shape: rounded
159+
label: "**[Document]**
160+
**filename\\*: chapter2.md**
161+
summary: In the second day, ..."
162+
classDef: node
163+
}
164+
165+
classDef node font-size:8pt,text-align:left,stroke-width:2;
166+
```
122167

123168
#### Relationships
124169

@@ -152,9 +197,92 @@ Note that the type used in different `RelationshipMapping`s should be unique.
152197

153198
All fields in the collector that are not used in mappings for source or target node fields will be mapped to relationship properties.
154199

200+
For example, consider we have collected the following rows, to describe places mentioned in each file, along with embeddings of the places:
201+
202+
<small>
203+
204+
| doc_filename | place_name | place_embedding | location |
205+
|----------|-------|-----------------|-----------------|
206+
| chapter1.md | Crystal Palace | [0.1, 0.5, ...] | 12 |
207+
| chapter2.md | Magic Forest | [0.4, 0.2, ...] | 23 |
208+
| chapter2.md | Crystal Palace | [0.1, 0.5, ...] | 56 |
209+
210+
</small>
211+
212+
We can export them to relationships under type `MENTION` like this:
213+
214+
```python
215+
doc_place_collector.export(
216+
...
217+
cocoindex.storages.Neo4j(
218+
...
219+
mapping=cocoindex.storages.RelationshipMapping(
220+
rel_type="MENTION",
221+
source=cocoindex.storages.NodeReferenceMapping(
222+
label="Document",
223+
fields=[cocoindex.storages.TargetFieldMapping(source="doc_filename", target="filename")],
224+
),
225+
target=cocoindex.storages.NodeReferenceMapping(
226+
label="Place",
227+
fields=[
228+
cocoindex.storages.TargetFieldMapping(source="place_name", target="name"),
229+
cocoindex.storages.TargetFieldMapping(source="place_embedding", target="embedding"),
230+
],
231+
),
232+
),
233+
),
234+
...
235+
)
236+
```
237+
238+
The `doc_filename` field is mapped to `Document.filename` property for the source node, while `place_name` and `place_embedding` are mapped to `Place.name` and `Place.embedding` properties for the target node.
239+
The remaining field `location` becomes a property of the relationship.
240+
For the data above, we get a bunch of relationships like this:
241+
242+
```mermaid
243+
graph TD
244+
Doc_Chapter1@{
245+
shape: rounded
246+
label: "**[Document]**
247+
**filename\\*: chapter1.md**"
248+
classDef: nodeRef
249+
}
250+
251+
Doc_Chapter2@{
252+
shape: rounded
253+
label: "**[Document]**
254+
**filename\\*: chapter2.md**"
255+
classDef: nodeRef
256+
}
257+
258+
Place_CrystalPalace@{
259+
shape: rounded
260+
label: "**[Place]**
261+
**name\\*: Crystal Palace**
262+
embedding: [0.1, 0.5, ...]"
263+
classDef: nodeRef
264+
}
265+
266+
Place_MagicForest@{
267+
shape: rounded
268+
label: "**[Place]**
269+
**name\\*: Magic Forest**
270+
embedding: [0.4, 0.2, ...]"
271+
classDef: nodeRef
272+
}
273+
274+
Doc_Chapter1:::nodeRef -- **[MENTION]**{location:12} --> Place_CrystalPalace:::nodeRef
275+
Doc_Chapter2:::nodeRef -- **[MENTION]**{location:23} --> Place_MagicForest:::nodeRef
276+
Doc_Chapter2:::nodeRef -- **[MENTION]**{location:56} --> Place_CrystalPalace:::nodeRef
277+
278+
classDef nodeRef font-size:8pt,text-align:left,fill:transparent,stroke-width:1,stroke-dasharray:5 5;
279+
280+
```
281+
282+
155283
#### Nodes only referenced by relationships
156284

157-
If a node appears as source or target of a relationship, but not exported using `NodeMapping`, CocoIndex will automatically create and keep these nodes until it's no longer referenced by any relationships.
285+
If a node appears as source or target of a relationship, but not exported using `NodeMapping`, CocoIndex will automatically create and keep these nodes until they're no longer referenced by any relationships.
158286

159287
:::note Merge of node values
160288

@@ -170,6 +298,65 @@ The following options are supported:
170298
* `primary_key_fields` (required)
171299
* `vector_indexes` (optional)
172300

301+
Using the same example above.
302+
After combining exported nodes and relationships, we get the knowledge graph with all information:
303+
304+
```mermaid
305+
graph TD
306+
Doc_Chapter1@{
307+
shape: rounded
308+
label: "**[Document]**
309+
**filename\\*: chapter1.md**
310+
summary: At the beginning, ..."
311+
classDef: node
312+
}
313+
314+
Doc_Chapter2@{
315+
shape: rounded
316+
label: "**[Document]**
317+
**filename\\*: chapter2.md**
318+
summary: In the second day, ..."
319+
classDef: node
320+
}
321+
322+
Place_CrystalPalace@{
323+
shape: rounded
324+
label: "**[Place]**
325+
**name\\*: Crystal Palace**
326+
embedding: [0.1, 0.5, ...]"
327+
classDef: nodeRef
328+
}
329+
330+
Place_MagicForest@{
331+
shape: rounded
332+
label: "**[Place]**
333+
**name\\*: Magic Forest**
334+
embedding: [0.4, 0.2, ...]"
335+
classDef: nodeRef
336+
}
337+
338+
Doc_Chapter1:::node -- **[MENTION]**{location:12} --> Place_CrystalPalace:::nodeRef
339+
Doc_Chapter2:::node -- **[MENTION]**{location:23} --> Place_MagicForest:::nodeRef
340+
Doc_Chapter2:::node -- **[MENTION]**{location:56} --> Place_CrystalPalace:::nodeRef
341+
342+
classDef node font-size:8pt,text-align:left,stroke-width:2;
343+
classDef nodeRef font-size:8pt,text-align:left,fill:transparent,stroke-width:1,stroke-dasharray:5 5;
344+
345+
```
346+
347+
Nodes with `Place` label in the example aren't exported explicitly using `NodeMapping`, so CocoIndex will automatically create them as long as they're still referenced by any relationship.
348+
You need to declare a `ReferencedNode`:
349+
350+
```python
351+
flow_builder.declare(
352+
cocoindex.storages.Neo4jDeclarations(
353+
...
354+
referenced_nodes=[
355+
cocoindex.storages.ReferencedNode(label="Place", primary_key_fields=["name"]),
356+
],
357+
),
358+
)
359+
```
173360

174361
### Neo4j
175362

@@ -201,4 +388,4 @@ Neo4j also provides a declaration spec `Neo4jDeclaration`, to configure indexing
201388
* `connection` (type: auth reference to `Neo4jConnectionSpec`)
202389
* `relationships` (type: `Sequence[ReferencedNode]`)
203390

204-
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph)
391+
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/docs_to_knowledge_graph).

docs/docusaurus.config.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ const config: Config = {
3333
locales: ['en'],
3434
},
3535

36+
markdown: {
37+
mermaid: true,
38+
},
39+
3640
plugins: [
3741
() => ({
3842
name: 'load-env-vars',
@@ -66,6 +70,7 @@ const config: Config = {
6670
],
6771
],
6872

73+
themes: ['@docusaurus/theme-mermaid'],
6974
themeConfig: {
7075
// Replace with your project's social card
7176
image: 'img/social-card.jpg',

docs/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"dependencies": {
1818
"@docusaurus/core": "3.7.0",
1919
"@docusaurus/preset-classic": "3.7.0",
20+
"@docusaurus/theme-mermaid": "^3.7.0",
2021
"@mdx-js/react": "^3.0.0",
2122
"clsx": "^2.0.0",
2223
"mixpanel-browser": "^2.59.0",

0 commit comments

Comments
 (0)