Skip to content

Commit 7400ea6

Browse files
authored
Ingest v2: Neo4j destination connector (#415)
1 parent c3ce895 commit 7400ea6

File tree

10 files changed

+256
-0
lines changed

10 files changed

+256
-0
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Neo4j
3+
---
4+
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentNeo4j from '/snippets/dc-shared-text/neo4j-cli-api.mdx';
10+
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
11+
12+
<SharedContentNeo4j/>
13+
<SharedAPIKeyURL/>
14+
15+
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
16+
17+
import Neo4jAPISh from '/snippets/destination_connectors/neo4j.sh.mdx';
18+
import Neo4jAPIPyV2 from '/snippets/destination_connectors/neo4j.v2.py.mdx';
19+
20+
<CodeGroup>
21+
<Neo4jAPISh />
22+
<Neo4jAPIPyV2 />
23+
</CodeGroup>
24+
25+
## Graph Output
26+
27+
import Neo4jGraphFormat from '/snippets/general-shared-text/neo4j-graph.mdx';
28+
29+
<Neo4jGraphFormat />
30+

api-reference/ingest/ingest-dependencies.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ To add support for additional connectors, run the following:
7171
| `pip install "unstructured-ingest[kafka]"` | Apache Kafka |
7272
| `pip install "unstructured-ingest[milvus]"` | Milvus |
7373
| `pip install "unstructured-ingest[mongodb]"` | MongoDB |
74+
| `pip install "unstructured-ingest[neo4j]"` | Neo4j |
7475
| `pip install "unstructured-ingest[notion]"` | Notion |
7576
| `pip install "unstructured-ingest[onedrive]"` | OneDrive |
7677
| `pip install "unstructured-ingest[opensearch]"` | OpenSearch |

mint.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,7 @@
223223
"open-source/ingest/destination-connectors/milvus",
224224
"open-source/ingest/destination-connectors/mongodb",
225225
"open-source/ingest/destination-connectors/motherduck",
226+
"open-source/ingest/destination-connectors/neo4j",
226227
"open-source/ingest/destination-connectors/onedrive",
227228
"open-source/ingest/destination-connectors/opensearch",
228229
"open-source/ingest/destination-connectors/pinecone",
@@ -385,6 +386,7 @@
385386
"api-reference/ingest/destination-connector/milvus",
386387
"api-reference/ingest/destination-connector/mongodb",
387388
"api-reference/ingest/destination-connector/motherduck",
389+
"api-reference/ingest/destination-connector/neo4j",
388390
"api-reference/ingest/destination-connector/onedrive",
389391
"api-reference/ingest/destination-connector/opensearch",
390392
"api-reference/ingest/destination-connector/pinecone",
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Neo4j
3+
---
4+
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedNeo4j from '/snippets/dc-shared-text/neo4j-cli-api.mdx';
10+
11+
<SharedNeo4j />
12+
13+
Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.
14+
15+
This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.
16+
17+
import Neo4jAPISh from '/snippets/destination_connectors/neo4j.sh.mdx';
18+
import Neo4jAPIPyV2 from '/snippets/destination_connectors/neo4j.v2.py.mdx';
19+
20+
<CodeGroup>
21+
<Neo4jAPISh />
22+
<Neo4jAPIPyV2 />
23+
</CodeGroup>
24+
25+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
26+
27+
<SharedPartitionByAPIOSS/>
28+
29+
## Graph Output
30+
31+
import Neo4jGraphFormat from '/snippets/general-shared-text/neo4j-graph.mdx';
32+
33+
<Neo4jGraphFormat />
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Batch process all your records to store structured outputs in a Neo4j account.
2+
3+
The requirements are as follows.
4+
5+
import SharedNeo4j from '/snippets/general-shared-text/neo4j.mdx';
6+
import SharedNeo4jCLIAPI from '/snippets/general-shared-text/neo4j-cli-api.mdx';
7+
8+
<SharedNeo4j />
9+
<SharedNeo4jCLIAPI />
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
```bash CLI
2+
#!/usr/bin/env bash
3+
4+
# Chunking and embedding are optional.
5+
6+
unstructured-ingest \
7+
local \
8+
--input-path $LOCAL_FILE_INPUT_DIR \
9+
--chunking-strategy by_title \
10+
--embedding-provider huggingface \
11+
--partition-by-api \
12+
--api-key $UNSTRUCTURED_API_KEY \
13+
--partition-endpoint $UNSTRUCTURED_API_URL \
14+
--strategy hi_res \
15+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
16+
neo4j \
17+
--username $NEO4J_USERNAME \
18+
--password $NEO4J_PASSWORD \
19+
--uri $NEO4J_URI \ # <scheme>://<host>:<port>
20+
--database $NEO4J_DATABASE \
21+
--batch-size 100
22+
```
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
```python Python Ingest v2
2+
import os
3+
4+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
5+
from unstructured_ingest.v2.interfaces import ProcessorConfig
6+
7+
from unstructured_ingest.v2.processes.connectors.neo4j import (
8+
Neo4jAccessConfig,
9+
Neo4jConnectionConfig,
10+
Neo4jUploadStagerConfig,
11+
Neo4jUploaderConfig
12+
)
13+
from unstructured_ingest.v2.processes.connectors.local import (
14+
LocalIndexerConfig,
15+
LocalConnectionConfig,
16+
LocalDownloaderConfig
17+
)
18+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
19+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
20+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
21+
22+
# Chunking and embedding are optional.
23+
24+
if __name__ == "__main__":
25+
Pipeline.from_configs(
26+
context=ProcessorConfig(),
27+
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
28+
downloader_config=LocalDownloaderConfig(),
29+
source_connection_config=LocalConnectionConfig(),
30+
partitioner_config=PartitionerConfig(
31+
partition_by_api=True,
32+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
33+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
34+
additional_partition_args={
35+
"split_pdf_page": True,
36+
"split_pdf_allow_failed": True,
37+
"split_pdf_concurrency_level": 15
38+
}
39+
),
40+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
41+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
42+
destination_connection_config=Neo4jConnectionConfig(
43+
access_config=Neo4jAccessConfig(password=os.getenv("NEO4J_PASSWORD")),
44+
username=os.getenv("NEO4J_USERNAME"),
45+
uri=os.getenv("NEO4J_URI"),
46+
database=os.getenv("NEO4J_DATABASE"),
47+
),
48+
stager_config=Neo4jUploadStagerConfig(),
49+
uploader_config=Neo4jUploaderConfig(batch_size=100)
50+
).run()
51+
```
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
The Neo4j connector dependencies:
2+
3+
```bash CLI, Python
4+
pip install "unstructured-ingest[neo4j]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
The following environment variables:
12+
13+
- `NEO4J_USERNAME` - The name of the target user with access to the target Neo4j deployment, represented by `--username` (CLI) or `username` (Python).
14+
- `NEO4J_PASSWORD` - The user's password, represented by `--password` (CLI) or `password` (Python).
15+
- `NEO4J_URI` - The connection URI for the deployment, represented by `--uri` (CLI) or `uri` (Python).
16+
- `NEO4J_DATABASE` - The name of the database in the deployment, represented by `--database` (CLI) or `database` (Python).
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
The graph ouput of the Neo4j destination connector is represented in the following diagram:
2+
3+
```mermaid
4+
graph BT
5+
subgraph dn [Document Node]
6+
D[Document]
7+
end
8+
style dn stroke-dasharray: 5
9+
10+
subgraph en [Element Nodes]
11+
UE1[UnstructuredElement]
12+
UE2[UnstructuredElement]
13+
UE3[UnstructuredElement]
14+
UE4[UnstructuredElement]
15+
UE5[UnstructuredElement]
16+
UE6[UnstructuredElement]
17+
end
18+
style en stroke-dasharray: 5
19+
20+
UE1 -->|PART_OF_DOCUMENT| D
21+
UE2 -->|PART_OF_DOCUMENT| D
22+
UE3 -->|PART_OF_DOCUMENT| D
23+
UE4 -->|PART_OF_DOCUMENT| D
24+
UE5 -->|PART_OF_DOCUMENT| D
25+
UE6 -->|PART_OF_DOCUMENT| D
26+
27+
subgraph cn [Chunk Nodes]
28+
C1[Chunk]
29+
C2[Chunk]
30+
C3[Chunk]
31+
C4[Chunk]
32+
end
33+
style cn stroke-dasharray: 5
34+
35+
C1 -->|NEXT_CHUNK| C2
36+
C2 -->|NEXT_CHUNK| C3
37+
C3 -->|NEXT_CHUNK| C4
38+
39+
C1 -->|PART_OF_DOCUMENT| D
40+
C2 -->|PART_OF_DOCUMENT| D
41+
C3 -->|PART_OF_DOCUMENT| D
42+
C4 -->|PART_OF_DOCUMENT| D
43+
44+
UE1 -.->|PART_OF_CHUNK| C1
45+
UE2 -.->|PART_OF_CHUNK| C1
46+
UE3 -.->|PART_OF_CHUNK| C2
47+
UE4 -.->|PART_OF_CHUNK| C3
48+
UE5 -.->|PART_OF_CHUNK| C4
49+
UE6 -.->|PART_OF_CHUNK| C4
50+
```
51+
52+
[View the preceding diagram in full-screen mode](https://mermaid.live/view#pako:eNqFlN9vgjAQx_-Vps-6REEfeFiyFZYli7hskCyTxXS0ihFaU9oHo_7vq_IjgIzyxN330157d70TjDmh0IFbgQ8JeA4iBvSXq9_CQRhYuTxWGWUS-Br9KQC39pYOyki5VB5Tel2XS8H3dExwnmAh8NEBs4LohKA6hJfSOkJe7hh6k1XI9C4qlkpQUjK1Oh1UrUHVHlRng-p8QO1kgRqzoC8JxuPH8_vTR7BevqzdJQoXnh-cgVvf0wRYJsA2ATMTMP8f6FQz1tVEiWL7Vi3RpHBW5rRtWm3TbpmdnMbGnKIipb73FazRa-i_nXXAKvC9ZFWHuJfs6nrIUCVkKBIy1AjZpgTfGuWhwVRnnDT6ZFC3-vVpo0v6dKvRJH263eiRXh2OYEZFhndEj5nTlY6gTPSriaCjfwndYJXKCEbsolGsJP88shg6-onRERRcbRPobHCaa0sdCJbU3WHdbFmFHDD75jyrIUp2kotFMddu4-3yB3k-fcg).
53+
54+
In the preceding diagram:
55+
56+
- The `Document` node represents the source file.
57+
- The `UnstructuredElement` nodes represent the source file's Unstructured `Element` objects, before chunking.
58+
- The `Chunk` nodes represent the source file's Unstructured `Element` objects, after chunking.
59+
- Each `UnstructuredElement` node has a `PART_OF_DOCUMENT` relationship with the `Document` node.
60+
- Each `Chunk` node also has a `PART_OF_DOCUMENT` relationship with the `Document` node.
61+
- Each `UnstructuredElement` node has a `PART_OF_CHUNK` relationship with a `Chunk` element.
62+
- Each `Chunk` node, except for the "last" `Chunk` node, has a `NEXT_CHUNK` relationship with its "next" `Chunk` node.
63+
64+
Learn more about [document elements](/platform/document-elements) and [chunking](/platform/chunking).
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
- A [Neo4j deployment](https://neo4j.com/deployment-center/).
2+
- The username and password for the user who has access to the Neo4j deployment. The default user is typically `neo4j`.
3+
4+
- For a Neo4j AuraDB instance, the defaut user's is typically set when the instance is created.
5+
- For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, the default user is typically set during the deployment process.
6+
- For a local Neo4j deployment, you can [set the default user's initial password](https://neo4j.com/docs/operations-manual/current/configuration/set-initial-password/) or [recover an admin user and its password](https://neo4j.com/docs/operations-manual/current/authentication-authorization/password-and-user-recovery/).
7+
8+
- The connection URI for the Neo4j deployment, which starts with `neo4j://`, `neo4j+s://`, `bolt://`, or `bolt+s://`; followed by `localhost` or the host name; and sometimes ending with a colon and the port number (such as `:7687`). For example:
9+
10+
- For a Neo4j AuraDB deployment, browse to the target Neo4j instance in the Neo4j Aura account and click **Connect > Drivers** to get the connection URI, which follows the format `neo4j+s://<host-name>`. A port number is not used or needed.
11+
- For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, see
12+
[Neo4j on AWS](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-aws/),
13+
[Neo4j on Azure](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-azure/), or
14+
[Neo4j on GCP](https://neo4j.com/docs/operations-manual/current/cloud-deployments/neo4j-gcp/)
15+
for details about how to get the connection URI.
16+
- For a local Neo4j deployment, the URI is typically `bolt://localhost:7687`
17+
- For other Neo4j deployment types, see the deployment provider's documentation.
18+
19+
[Learn more](https://neo4j.com/docs/browser-manual/current/operations/dbms-connection).
20+
21+
- The name of the target database in the Neo4j deployment. A default Neo4j deployment typically contains two standard databases: one named `neo4j` for user data and another
22+
named `system` for system data and metadata. Some Neo4j deployment types support more than these two databases per deployment;
23+
Neo4j AuraDB instances do not.
24+
25+
- [Create additional databases](https://neo4j.com/docs/operations-manual/current/database-administration/standard-databases/create-databases/)
26+
for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.
27+
- [Get a list of additional available databases](https://neo4j.com/docs/operations-manual/current/database-administration/standard-databases/listing-databases/)
28+
for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.

0 commit comments

Comments
 (0)