You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/contributing/guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: How to contribute to CocoIndex
5
5
6
6
[CocoIndex](https://github.com/cocoindex-io/cocoindex) is an open source project. We are respectful, open and friendly. This guide explains how to get involved and contribute to [CocoIndex](https://github.com/cocoindex-io/cocoindex).
7
7
8
-
Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is constantly open.
8
+
Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is constantly open.
9
9
If you are unsure about anything, it is a good place to discuss! We'd love to collaborate and will always be friendly.
- Include files with the extensions of `.py`, `.rs`, `.toml`, `.md`, `.mdx`
40
40
- Exclude files and directories starting `.`, `target` in the root and `node_modules` under any directory.
41
41
42
-
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
42
+
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
43
43
See [documentation](https://cocoindex.io/docs/ops/sources) for more details.
44
44
45
45
@@ -70,23 +70,23 @@ Here we extract the extension of the filename and store it in the `extension` fi
70
70
71
71
### Split the file into chunks
72
72
73
-
We will chunk the code with Tree-sitter.
74
-
We use the `SplitRecursively` function to split the file into chunks.
73
+
We will chunk the code with Tree-sitter.
74
+
We use the `SplitRecursively` function to split the file into chunks.
75
75
It is integrated with Tree-sitter, so you can pass in the language to the `language` parameter.
76
76
To see all supported language names and extensions, see the documentation [here](https://cocoindex.io/docs/ops/functions#splitrecursively). All the major languages are supported, e.g., Python, Rust, JavaScript, TypeScript, Java, C++, etc. If it's unspecified or the specified language is not supported, it will be treated as plain text.
Then for each chunk, we will embed it using the `code_to_embedding` function. and collect the embeddings to the `code_embeddings` collector.
103
103
104
-
`@cocoindex.transform_flow()` is needed to share the transformation across indexing and query. We build a vector index and query against it,
104
+
`@cocoindex.transform_flow()` is needed to share the transformation across indexing and query. We build a vector index and query against it,
105
105
the embedding computation needs to be consistent between indexing and querying. See [documentation](https://cocoindex.io/docs/query#transform-flow) for more details.
`apply_setup_change()` applies setup changes to the backend. The previous and current specs are passed as arguments,
106
+
`apply_setup_change()` applies setup changes to the backend. The previous and current specs are passed as arguments,
107
107
and the method is expected to update the backend setup to match the current state.
108
108
109
-
A `None` spec indicates non-existence, so when `previous` is `None`, we need to create it,
109
+
A `None` spec indicates non-existence, so when `previous` is `None`, we need to create it,
110
110
and when `current` is `None`, we need to delete it.
111
111
112
112
@@ -134,8 +134,8 @@ def apply_setup_change(
134
134
os.rmdir(previous.directory)
135
135
```
136
136
137
-
The `mutate()` method is called by CocoIndex to apply data changes to the target,
138
-
batching mutations to potentially multiple targets of the same type.
137
+
The `mutate()` method is called by CocoIndex to apply data changes to the target,
138
+
batching mutations to potentially multiple targets of the same type.
139
139
This allows the target connector flexibility in implementation (e.g., atomic commits, or processing items with dependencies in a specific order).
140
140
141
141
Each element in the batch corresponds to a specific target and is represented by a tuple containing:
@@ -150,8 +150,8 @@ class LocalFileTargetValues:
150
150
html: str
151
151
```
152
152
153
-
The value type of the `dict` is `LocalFileTargetValues | None`,
154
-
where a non-`None` value means an upsert and `None` value means a delete. Similar to `apply_setup_changes()`,
153
+
The value type of the `dict` is `LocalFileTargetValues | None`,
154
+
where a non-`None` value means an upsert and `None` value means a delete. Similar to `apply_setup_changes()`,
155
155
idempotency is expected here.
156
156
157
157
```python
@@ -218,5 +218,7 @@ This keeps your knowledge graph continuously synchronized with your document sou
218
218
Sometimes there may be an internal/homegrown tool or API (e.g. within a company) that's not publicly available.
219
219
These can only be connected through custom targets.
220
220
221
-
### Faster adoption of new export logic
221
+
### Faster adoption of new export logic
222
222
When a new tool, database, or API joins your stack, simply define a Target Spec and Target Connector — start exporting right away, with no pipeline refactoring required.
Copy file name to clipboardExpand all lines: docs/docs/examples/examples/docs_to_knowledge_graph.md
+11-10Lines changed: 11 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ We will generate two kinds of relationships:
23
23
2. Mentions of entities in a document. E.g., "core/basics.mdx" mentions `CocoIndex` and `Incremental Processing`.
24
24
25
25
## Setup
26
-
*[Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
26
+
*[Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
27
27
*[Install Neo4j](https://cocoindex.io/docs/ops/storages#Neo4j), a graph database.
28
28
*[Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, you can switch to Ollama, which runs LLM models locally - [guide](https://cocoindex.io/docs/ai/llm#ollama).
29
29
@@ -36,7 +36,7 @@ You can read the official CocoIndex Documentation for Property Graph Targets [he
36
36
37
37
### Add documents as source
38
38
39
-
We will process CocoIndex documentation markdown files (`.md`, `.mdx`) from the `docs/core` directory ([markdown files](https://github.com/cocoindex-io/cocoindex/tree/main/docs/docs/core), [deployed docs](https://cocoindex.io/docs/core/basics)).
39
+
We will process CocoIndex documentation markdown files (`.md`, `.mdx`) from the `docs/core` directory ([markdown files](https://github.com/cocoindex-io/cocoindex/tree/main/docs/docs/core), [deployed docs](https://cocoindex.io/docs/core/basics)).
40
40
41
41
```python
42
42
@cocoindex.flow_def(name="DocsToKG")
@@ -124,7 +124,7 @@ Next, we will use `cocoindex.functions.ExtractByLlm` to extract the relationship
124
124
doc["relationships"] = doc["content"].transform(
125
125
cocoindex.functions.ExtractByLlm(
126
126
llm_spec=cocoindex.LlmSpec(
127
-
api_type=cocoindex.LlmApiType.OPENAI,
127
+
api_type=cocoindex.LlmApiType.OPENAI,
128
128
model="gpt-4o"
129
129
),
130
130
output_type=list[Relationship],
@@ -170,7 +170,7 @@ with doc["relationships"].row() as relationship:
170
170
171
171
172
172
### Build knowledge graph
173
-
173
+
174
174
#### Basic concepts
175
175
All nodes for Neo4j need two things:
176
176
1. Label: The type of the node. E.g., `Document`, `Entity`.
@@ -216,10 +216,10 @@ This exports Neo4j nodes with label `Document` from the `document_node` collecto
216
216
217
217
#### Export `RELATIONSHIP` and `Entity` nodes to Neo4j
218
218
219
-
We don't have explicit collector for `Entity` nodes.
219
+
We don't have explicit collector for `Entity` nodes.
220
220
They are part of the `entity_relationship` collector and fields are collected during the relationship extraction.
221
221
222
-
To export them as Neo4j nodes, we need to first declare `Entity` nodes.
222
+
To export them as Neo4j nodes, we need to first declare `Entity` nodes.
223
223
224
224
```python
225
225
flow_builder.declare(
@@ -268,7 +268,7 @@ In a relationship, there's:
268
268
2. A relationship connecting the source and target.
269
269
Note that different relationships may share the same source and target nodes.
270
270
271
-
`NodeFromFields` takes the fields from the `entity_relationship` collector and creates `Entity` nodes.
271
+
`NodeFromFields` takes the fields from the `entity_relationship` collector and creates `Entity` nodes.
272
272
273
273
#### Export the `entity_mention` to Neo4j.
274
274
@@ -314,14 +314,14 @@ It creates relationships by:
314
314
```sh
315
315
cocoindex update --setup main.py
316
316
```
317
-
317
+
318
318
You'll see the index updates state in the terminal. For example, you'll see the following output:
319
319
320
320
```
321
321
documents: 7 added, 0 removed, 0 updated
322
322
```
323
323
324
-
3. (Optional) I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline.
324
+
3. (Optional) I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline.
325
325
It is in free beta now, you can give it a try. Run following command to start CocoInsight:
326
326
327
327
```sh
@@ -348,7 +348,8 @@ MATCH p=()-->() RETURN p
348
348
349
349
350
350
## Support us
351
-
We are constantly improving, and more features and examples are coming soon.
351
+
We are constantly improving, and more features and examples are coming soon.
352
352
If you love this article, please give us a star ⭐ at [GitHub repo](https://github.com/cocoindex-io/cocoindex) to help us grow.
0 commit comments