Skip to content

Commit 1295f29

Browse files
authored
API: Qdrant v2 destination connector (#344)
1 parent 5b1c21f commit 1295f29

File tree

9 files changed

+255
-44
lines changed

9 files changed

+255
-44
lines changed

api-reference/ingest/destination-connector/qdrant.mdx

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,28 @@
22
title: Qdrant
33
---
44

5-
import SharedQdrant from '/snippets/dc-shared-text/qdrant.mdx';
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentQdrant from '/snippets/dc-shared-text/qdrant-cli-api.mdx';
10+
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
11+
12+
<SharedContentQdrant/>
13+
<SharedAPIKeyURL/>
14+
15+
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.
16+
17+
This example uses the local source connector:
18+
19+
import QdrantAPISh from '/snippets/destination_connectors/qdrant.sh.mdx';
20+
import QdrantAPIPyV2 from '/snippets/destination_connectors/qdrant.v2.py.mdx';
21+
import QdrantAPIPyV1 from '/snippets/destination_connectors/qdrant.v1.py.mdx';
22+
23+
<CodeGroup>
24+
<QdrantAPISh />
25+
<QdrantAPIPyV2 />
26+
<QdrantAPIPyV1 />
27+
</CodeGroup>
28+
629

7-
<SharedQdrant />

open-source/ingest/destination-connectors/qdrant.mdx

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,28 @@
22
title: Qdrant
33
---
44

5-
import SharedQdrant from '/snippets/dc-shared-text/qdrant.mdx';
5+
<NewDocument />
66

7-
<SharedQdrant />
7+
import SharedContentQdrant from '/snippets/dc-shared-text/qdrant-cli-api.mdx';
8+
9+
<SharedContentQdrant/>
10+
11+
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.
12+
13+
This example uses the local source connector.
14+
15+
This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.
16+
17+
import QdrantAPISh from '/snippets/destination_connectors/qdrant.sh.mdx';
18+
import QdrantAPIPyV2 from '/snippets/destination_connectors/qdrant.v2.py.mdx';
19+
import QdrantAPIPyV1 from '/snippets/destination_connectors/qdrant.v1.py.mdx';
20+
21+
<CodeGroup>
22+
<QdrantAPISh />
23+
<QdrantAPIPyV2 />
24+
<QdrantAPIPyV1 />
25+
</CodeGroup>
26+
27+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
28+
29+
<SharedPartitionByAPIOSS/>
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Batch process all your records to store structured outputs in Qdrant.
2+
3+
You will need:
4+
5+
import SharedQdrant from '/snippets/general-shared-text/qdrant.mdx';
6+
import SharedQdrantCLIAPI from '/snippets/general-shared-text/qdrant-cli-api.mdx';
7+
8+
<SharedQdrant />
9+
<SharedQdrantCLIAPI />

snippets/dc-shared-text/qdrant.mdx

Lines changed: 0 additions & 27 deletions
This file was deleted.
Lines changed: 45 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,54 @@
1-
```bash Shell
1+
```bash CLI
22
#!/usr/bin/env bash
33

44
# Chunking and embedding are optional.
55

6+
# For Qdrant local:
67
unstructured-ingest \
78
local \
89
--input-path $LOCAL_FILE_INPUT_DIR \
9-
--output-dir $LOCAL_FILE_OUTPUT_DIR \
10-
--strategy hi_res \
11-
--chunk-elements \
10+
--chunking-strategy by_title \
1211
--embedding-provider huggingface \
13-
--num-processes 2 \
14-
--verbose \
15-
qdrant \
16-
--collection-name $QDRANT_COLLECTION_NAME \
17-
--location http://localhost:6333 \
18-
--batch-size 80
12+
--partition-by-api \
13+
--api-key $UNSTRUCTURED_API_KEY \
14+
--partition-endpoint $UNSTRUCTURED_API_URL \
15+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
16+
qdrant-local \
17+
--path $QDRANT_PATH \
18+
--collection-name $QDRANT_COLLECTION \
19+
--batch-size 50 \
20+
--num-processes 1
21+
22+
# For Qdrant client-server:
23+
unstructured-ingest \
24+
local \
25+
--input-path $LOCAL_FILE_INPUT_DIR \
26+
--chunking-strategy by_title \
27+
--embedding-provider huggingface \
28+
--partition-by-api \
29+
--api-key $UNSTRUCTURED_API_KEY \
30+
--partition-endpoint $UNSTRUCTURED_API_URL \
31+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
32+
qdrant-server \
33+
--url $QDRANT_URL \
34+
--collection-name $QDRANT_COLLECTION \
35+
--batch-size 50 \
36+
--num-processes 1
37+
38+
# For Qdrant cloud:
39+
unstructured-ingest \
40+
local \
41+
--input-path $LOCAL_FILE_INPUT_DIR \
42+
--chunking-strategy by_title \
43+
--embedding-provider huggingface \
44+
--partition-by-api \
45+
--api-key $UNSTRUCTURED_API_KEY \
46+
--partition-endpoint $UNSTRUCTURED_API_URL \
47+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
48+
qdrant-cloud \
49+
--url $QDRANT_URL \
50+
--api-key $QDRANT_API_KEY \
51+
--collection-name $QDRANT_COLLECTION \
52+
--batch-size 50 \
53+
--num-processes 1
1954
```

snippets/destination_connectors/qdrant.py.mdx renamed to snippets/destination_connectors/qdrant.v1.py.mdx

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
```python Python
1+
```python Python Ingest v1
2+
import os
3+
24
from unstructured_ingest.connector.local import SimpleLocalConfig
35
from unstructured_ingest.connector.qdrant import (
46
QdrantWriteConfig,
57
SimpleQdrantConfig,
8+
QdrantAccessConfig,
69
)
710
from unstructured_ingest.interfaces import (
811
ChunkingConfig,
@@ -15,12 +18,14 @@ from unstructured_ingest.runner import LocalRunner
1518
from unstructured_ingest.runner.writers.base_writer import Writer
1619
from unstructured_ingest.runner.writers.qdrant import QdrantWriter
1720

21+
# This example uses Qdrant Cloud.
1822

1923
def get_writer() -> Writer:
2024
return QdrantWriter(
2125
connector_config=SimpleQdrantConfig(
22-
location="http://localhost:6333",
23-
collection_name="test",
26+
url=os.getenv("QDRANT_URL"),
27+
access_config=QdrantAccessConfig(api_key=os.getenv("QDRANT_API_KEY")),
28+
collection_name=os.getenv("QDRANT_COLLECTION"),
2429
),
2530
write_config=QdrantWriteConfig(batch_size=80),
2631
)
@@ -40,7 +45,15 @@ if __name__ == "__main__":
4045
),
4146
read_config=ReadConfig(),
4247
partition_config=PartitionConfig(
48+
partition_by_api=True,
49+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
50+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
4351
strategy="hi_res",
52+
additional_partition_args={
53+
"split_pdf_page": True,
54+
"split_pdf_allow_failed": True,
55+
"split_pdf_concurrency_level": 15
56+
}
4457
),
4558
chunking_config=ChunkingConfig(chunk_elements=True),
4659
embedding_config=EmbeddingConfig(
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
```python Python Ingest v2
2+
import os
3+
4+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
5+
from unstructured_ingest.v2.interfaces import ProcessorConfig
6+
7+
from unstructured_ingest.v2.processes.connectors.local import (
8+
LocalIndexerConfig,
9+
LocalDownloaderConfig,
10+
LocalConnectionConfig
11+
)
12+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
13+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
14+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
15+
16+
# For Qdrant local:
17+
# from unstructured_ingest.v2.processes.connectors.qdrant.local import (
18+
# LocalQdrantConnectionConfig,
19+
# LocalQdrantAccessConfig,
20+
# LocalQdrantUploadStagerConfig,
21+
# LocalQdrantUploaderConfig
22+
# )
23+
24+
# For Qdrant client-server:
25+
# from unstructured_ingest.v2.processes.connectors.qdrant.server import (
26+
# ServerQdrantConnectionConfig,
27+
# ServerQdrantAccessConfig,
28+
# ServerQdrantUploadStagerConfig,
29+
# ServerQdrantUploaderConfig
30+
# )
31+
32+
# For Qdrant Cloud:
33+
from unstructured_ingest.v2.processes.connectors.qdrant.cloud import (
34+
CloudQdrantConnectionConfig,
35+
CloudQdrantAccessConfig,
36+
CloudQdrantUploadStagerConfig,
37+
CloudQdrantUploaderConfig
38+
)
39+
40+
# Chunking and embedding are optional.
41+
42+
if __name__ == "__main__":
43+
Pipeline.from_configs(
44+
context=ProcessorConfig(),
45+
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
46+
downloader_config=LocalDownloaderConfig(),
47+
source_connection_config=LocalConnectionConfig(),
48+
partitioner_config=PartitionerConfig(
49+
partition_by_api=True,
50+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
51+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
52+
additional_partition_args={
53+
"split_pdf_page": True,
54+
"split_pdf_allow_failed": True,
55+
"split_pdf_concurrency_level": 15
56+
}
57+
),
58+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
59+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
60+
61+
# For Qdrant local:
62+
# destination_connection_config=LocalQdrantConnectionConfig(
63+
# access_config=LocalQdrantAccessConfig(),
64+
# path=os.getenv("QDRANT_PATH")
65+
# ),
66+
# stager_config=LocalQdrantUploadStagerConfig(),
67+
# uploader_config=LocalQdrantUploaderConfig(
68+
# collection_name=os.gentenv("QDRANT_COLLECTION"),
69+
# batch_size=50,
70+
# num_processes=1
71+
# )
72+
73+
# For Qdrant client-server:
74+
# destination_connection_config=ServerQdrantConnectionConfig(
75+
# access_config=ServerQdrantAccessConfig(),
76+
# url=os.getenv("QDRANT_URL")
77+
# ),
78+
# stager_config=ServerQdrantUploadStagerConfig(),
79+
# uploader_config=ServerQdrantUploaderConfig(
80+
# collection_name=os.gentenv("QDRANT_COLLECTION"),
81+
# batch_size=50,
82+
# num_processes=1
83+
# )
84+
85+
# For Qdrant cloud:
86+
destination_connection_config=CloudQdrantConnectionConfig(
87+
access_config=CloudQdrantAccessConfig(
88+
api_key=os.getenv("QDRANT_API_KEY")
89+
),
90+
url=os.getenv("QDRANT_URL")
91+
),
92+
stager_config=CloudQdrantUploadStagerConfig(),
93+
uploader_config=CloudQdrantUploaderConfig(
94+
collection_name=os.gentenv("QDRANT_COLLECTION"),
95+
batch_size=50,
96+
num_processes=1
97+
)
98+
).run()
99+
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
The Qdrant connector dependencies.
2+
3+
```bash
4+
pip install "unstructured-ingest[qdrant]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
The following environment variables:
12+
13+
- `QDRANT_COLLECTION` - The name of the target collection on the Qdrant local installation,
14+
Qdrant server, or Qdrant Cloud cluster, represented by `--collection-name` (CLI) or `collection_name` (Python).
15+
- For Qdrant local, `QDRANT_PATH` - The path to the local Qdrant installation, represented by `--path` (CLI) or `path` (Python).
16+
- For Qdrant client-server, `QDRANT_URL` - The Qdrant server's URL, represented by `--url` (CLI) or `url` (Python).
17+
- For Qdrant Cloud:
18+
19+
- `QDRANT_URL` - The Qdrant cluster's URL, represented by `--url` (CLI) or `url` (Python).
20+
- `QDRANT_API_KEY` - The Qdrant API key, represented by `--api-key` (CLI) or `api_key` (Python).
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
The Qdrant prerequisites are as follows.
2+
3+
- The name of the target [collection](https://qdrant.tech/documentation/concepts/collections) on the Qdrant local installation,
4+
Qdrant server, or Qdrant Cloud cluster.
5+
- For [Qdrant local](https://github.com/qdrant/qdrant), the path to the local Qdrant installation, for example: `/qdrant/local`
6+
- For [Qdrant client-server](https://qdrant.tech/documentation/quickstart/), the Qdrant server URL, for example: `http://localhost:6333`
7+
- For [Qdrant Cloud](https://qdrant.tech/documentation/cloud-intro/):
8+
9+
- A [Qdrant account](https://cloud.qdrant.io/login).
10+
- A [Qdrant cluster](https://qdrant.tech/documentation/cloud/create-cluster/).
11+
- The cluster's URL. To get this URL, do the following:
12+
13+
1. Sign in to your Qdrant Cloud account.
14+
2. On the sidebar, under **Dashboard**, click **Clusters**.
15+
3. Click the cluster's name.
16+
4. Note the value of the **Endpoint** field, for example: `https://<random-guid>.<region-id>.<cloud-provider>.cloud.qdrant.io`.
17+
18+
- A [Qdrant API key](https://qdrant.tech/documentation/cloud/authentication/#create-api-keys).

0 commit comments

Comments
 (0)