Skip to content

Commit c5f6a9d

Browse files
committed
Merge remote-tracking branch 'upstream/HEAD' into qdrant
Signed-off-by: Anush008 <[email protected]>
2 parents 67f1e71 + fca700a commit c5f6a9d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2705
-312
lines changed

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,5 @@ qdrant-client = "1.13.0"
101101
uuid = { version = "1.16.0", features = ["serde", "v4", "v8"] }
102102
tokio-stream = "0.1.17"
103103
async-stream = "0.3.6"
104+
neo4rs = "0.8.0"
105+
bytes = "1.10.1"

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
[![Python](https://img.shields.io/badge/python-3.11%20to%203.13-5B5BD6?logo=python&logoColor=white)](https://www.python.org/)
1313
[![CI](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml)
1414
[![release](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml)
15-
[![docs](https://github.com/cocoindex-io/cocoindex/actions/workflows/docs.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/docs.yml)
1615
[![Discord](https://img.shields.io/discord/1314801574169673738?logo=discord&color=5B5BD6&logoColor=white)](https://discord.com/invite/zpA9S2DR7s)
1716
[![LinkedIn](https://img.shields.io/badge/LinkedIn-CocoIndex-5B5BD6?logo=linkedin&logoColor=white)](https://www.linkedin.com/company/cocoindex)
1817
[![X (Twitter)](https://img.shields.io/twitter/follow/cocoindex_io)](https://twitter.com/intent/follow?screen_name=cocoindex_io)
@@ -97,11 +96,12 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
9796
| [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
9897
| [Manuals LLM Extraction](examples/manuals_llm_extraction) | Extract structured information from a manual using LLM |
9998
| [Google Drive Text Embedding](examples/gdrive_text_embedding) | Index text documents from Google Drive |
99+
| [Docs to Knowledge Graph](examples/docs_to_kg) | Extract relationships from Markdown documents and build a knowledge graph |
100100

101101
More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.
102102
103103
## 📖 Documentation
104-
For detailed documentation, visit [Cocoindex Documentation](https://cocoindex.io/docs), including a [Quickstart guide](https://cocoindex.io/docs/getting_started/quickstart).
104+
For detailed documentation, visit [CocoIndex Documentation](https://cocoindex.io/docs), including a [Quickstart guide](https://cocoindex.io/docs/getting_started/quickstart).
105105
106106
## 🤝 Contributing
107107
We love contributions from our community ❤️. For details on contributing or running the project for development, check out our [contributing guide](https://cocoindex.io/docs/about/contributing).

dev/neo4j.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
name: cocoindex-neo4j
12
services:
23
neo4j:
34
image: neo4j:latest
@@ -8,6 +9,7 @@ services:
89
- /$HOME/neo4j/plugins:/plugins
910
environment:
1011
- NEO4J_AUTH=neo4j/cocoindex
12+
- NEO4J_PLUGINS='["graph-data-science"]'
1113
ports:
1214
- "7474:7474"
1315
- "7687:7687"

docs/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,20 @@
1-
# Website
1+
<p align="center">
2+
<img src="https://cocoindex.io/images/github.svg" alt="CocoIndex">
3+
</p>
24

3-
This website is built using [Docusaurus](https://docusaurus.io/), a modern static website generator.
5+
<h2 align="center">📖 Documentation https://cocoindex.io/docs </h2>
6+
7+
<div align="center">
8+
9+
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
10+
[![License](https://img.shields.io/badge/license-Apache%202.0-5B5BD6?logo=opensourceinitiative&logoColor=white)](https://opensource.org/licenses/Apache-2.0)
11+
[![docs](https://github.com/cocoindex-io/cocoindex/actions/workflows/docs.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/docs.yml)
12+
13+
</div>
14+
15+
16+
17+
This directory is the source code for the CocoIndex documentation website, built using [Docusaurus](https://docusaurus.io/).
418

519
### Installation
620

examples/code_embedding/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ In this example, we will build an embedding index for a codebase using CocoIndex
77

88
Please give [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a star to support us if you like our work. Thank you so much with a warm coconut hug 🥥🤗. [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
99

10-
You can find a detailed blog post with step by step tutorial and explanations [here](https://cocoindex.io/blogs/index-code-base-for-rag).
10+
## Tutorials
11+
- Blog with step by step tutorial [here](https://cocoindex.io/blogs/index-code-base-for-rag).
12+
- Video walkthrough [here](https://youtu.be/G3WstvhHO24?si=Bnxu67Ax5Lv8b-J2)
1113

1214

1315
## Prerequisite

examples/code_embedding/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ name = "code-embedding"
33
version = "0.1.0"
44
description = "Simple example for cocoindex: build embedding index based on source code."
55
requires-python = ">=3.10"
6-
dependencies = ["cocoindex>=0.1.11", "python-dotenv>=1.0.1"]
6+
dependencies = ["cocoindex>=0.1.19", "python-dotenv>=1.0.1"]

examples/docs_to_kg/.env

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Postgres database address for cocoindex
2+
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex

examples/docs_to_kg/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Build Knowledge Graph from Markdown Documents, with OpenAI, Neo4j and CocoIndex
2+
3+
In this example, we
4+
5+
* Extract relationships from Markdown documents.
6+
* Build a knowledge graph from the relationships.
7+
8+
Please give [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a star to support us if you like our work. Thank you so much with a warm coconut hug 🥥🤗. [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
9+
10+
## Prerequisite
11+
12+
Before running the example, you need to:
13+
14+
* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
15+
* [Install Neo4j](https://cocoindex.io/docs/getting_started/installation#-install-neo4j) if you don't have one.
16+
* Install / configure LLM API. In this example we use OpenAI. You need to [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example. Alternatively, you can also follow the comments in source code to switch to Ollama, which runs LLM model locally, and get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama).
17+
18+
## Run
19+
20+
### Build the index
21+
22+
Install dependencies:
23+
24+
```bash
25+
pip install -e .
26+
```
27+
28+
Setup:
29+
30+
```bash
31+
python main.py cocoindex setup
32+
```
33+
34+
Update index:
35+
36+
```bash
37+
python main.py cocoindex update
38+
```
39+
40+
### Browse the knowledge graph
41+
42+
After the knowledge graph is build, you can explore the knowledge graph you built in Neo4j Browser.
43+
You can open it at [http://localhost:7474](http://localhost:7474), and run the following Cypher query to get all relationships:
44+
45+
```cypher
46+
MATCH p=()-->() RETURN p
47+
```
48+
49+
## CocoInsight
50+
CocoInsight is a tool to help you understand your data pipeline and data index. CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
51+
52+
Run CocoInsight to understand your RAG data pipeline:
53+
54+
```
55+
python main.py cocoindex server -c https://cocoindex.io
56+
```
57+
58+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight). It connects to your local CocoIndex server with zero data retention.
59+
60+
You can view the pipeline flow and the data preview in the CocoInsight UI:
61+
![CocoInsight UI](https://cocoindex.io/blogs/assets/images/cocoinsight-edd71690dcc35b6c5cf1cb31b51b6f6f.png)

examples/docs_to_kg/main.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
"""
2+
This example shows how to extract relationships from Markdown documents and build a knowledge graph.
3+
"""
4+
import dataclasses
5+
from dotenv import load_dotenv
6+
import cocoindex
7+
8+
9+
@dataclasses.dataclass
10+
class Relationship:
11+
"""Describe a relationship between two nodes."""
12+
subject: str
13+
predicate: str
14+
object: str
15+
16+
@dataclasses.dataclass
17+
class Relationships:
18+
"""Describe a relationship between two nodes."""
19+
relationships: list[Relationship]
20+
21+
@cocoindex.flow_def(name="DocsToKG")
22+
def docs_to_kg_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
23+
"""
24+
Define an example flow that extracts triples from files and build knowledge graph.
25+
"""
26+
27+
conn_spec = cocoindex.add_auth_entry(
28+
"Neo4jConnection",
29+
cocoindex.storages.Neo4jConnectionSpec(
30+
uri="bolt://localhost:7687",
31+
user="neo4j",
32+
password="cocoindex",
33+
))
34+
35+
data_scope["documents"] = flow_builder.add_source(
36+
cocoindex.sources.LocalFile(path="../../docs/docs/core",
37+
included_patterns=["*.md", "*.mdx"]))
38+
39+
relationships = data_scope.add_collector()
40+
41+
with data_scope["documents"].row() as doc:
42+
doc["chunks"] = doc["content"].transform(
43+
cocoindex.functions.SplitRecursively(),
44+
language="markdown", chunk_size=10000)
45+
46+
with doc["chunks"].row() as chunk:
47+
chunk["relationships"] = chunk["text"].transform(
48+
cocoindex.functions.ExtractByLlm(
49+
llm_spec=cocoindex.LlmSpec(
50+
api_type=cocoindex.LlmApiType.OPENAI, model="gpt-4o"),
51+
output_type=Relationships,
52+
instruction=(
53+
"Please extract relationships from CocoIndex documents. "
54+
"Focus on concepts and ingnore specific examples. "
55+
"Each relationship should be a tuple of (subject, predicate, object).")))
56+
57+
with chunk["relationships"]["relationships"].row() as relationship:
58+
relationships.collect(
59+
id=cocoindex.GeneratedField.UUID,
60+
subject=relationship["subject"],
61+
predicate=relationship["predicate"],
62+
object=relationship["object"],
63+
)
64+
65+
relationships.export(
66+
"relationships",
67+
cocoindex.storages.Neo4jRelationship(
68+
connection=conn_spec,
69+
rel_type="RELATIONSHIP",
70+
source=cocoindex.storages.Neo4jRelationshipEndSpec(
71+
label="Entity",
72+
fields=[cocoindex.storages.Neo4jFieldMapping(field_name="subject", node_field_name="value")]
73+
),
74+
target=cocoindex.storages.Neo4jRelationshipEndSpec(
75+
label="Entity",
76+
fields=[cocoindex.storages.Neo4jFieldMapping(field_name="object", node_field_name="value")]
77+
),
78+
nodes={
79+
"Entity": cocoindex.storages.Neo4jRelationshipNodeSpec(key_field_name="value"),
80+
},
81+
),
82+
primary_key_fields=["id"],
83+
)
84+
85+
@cocoindex.main_fn()
86+
def _run():
87+
pass
88+
89+
if __name__ == "__main__":
90+
load_dotenv(override=True)
91+
_run()

0 commit comments

Comments
 (0)