Skip to content

Commit 115c954

Browse files
authored
Ingest v2: Redis destination connector (#423)
1 parent b1443fe commit 115c954

File tree

9 files changed

+189
-0
lines changed

9 files changed

+189
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: Redis
3+
---
4+
5+
import NewDocument from '/snippets/general-shared-text/new-document.mdx';
6+
7+
<NewDocument />
8+
9+
import SharedContentRedis from '/snippets/dc-shared-text/redis-cli-api.mdx';
10+
import SharedAPIKeyURL from '/snippets/general-shared-text/api-key-url.mdx';
11+
12+
<SharedContentRedis/>
13+
<SharedAPIKeyURL/>
14+
15+
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.
16+
17+
This example uses the local source connector:
18+
19+
import RedisAPISh from '/snippets/destination_connectors/redis.sh.mdx';
20+
import RedisAPIPyV2 from '/snippets/destination_connectors/redis.v2.py.mdx';
21+
22+
<CodeGroup>
23+
<RedisAPISh />
24+
<RedisAPIPyV2 />
25+
</CodeGroup>
26+
27+

api-reference/ingest/ingest-dependencies.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ To add support for additional connectors, run the following:
7979
| `pip install "unstructured-ingest[postgres]"` | PostgreSQL, SQLite |
8080
| `pip install "unstructured-ingest[qdrant]"` | Qdrant |
8181
| `pip install "unstructured-ingest[reddit]"` | Reddit |
82+
| `pip install "unstructured-ingest[redis]"` | Redis |
8283
| `pip install "unstructured-ingest[s3]"` | Amazon S3 |
8384
| `pip install "unstructured-ingest[sharepoint]"` | SharePoint |
8485
| `pip install "unstructured-ingest[salesforce]"` | Salesforce

mint.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,7 @@
228228
"open-source/ingest/destination-connectors/pinecone",
229229
"open-source/ingest/destination-connectors/postgresql",
230230
"open-source/ingest/destination-connectors/qdrant",
231+
"open-source/ingest/destination-connectors/redis",
231232
"open-source/ingest/destination-connectors/s3",
232233
"open-source/ingest/destination-connectors/sftp",
233234
"open-source/ingest/destination-connectors/singlestore",
@@ -389,6 +390,7 @@
389390
"api-reference/ingest/destination-connector/pinecone",
390391
"api-reference/ingest/destination-connector/postgresql",
391392
"api-reference/ingest/destination-connector/qdrant",
393+
"api-reference/ingest/destination-connector/redis",
392394
"api-reference/ingest/destination-connector/s3",
393395
"api-reference/ingest/destination-connector/sftp",
394396
"api-reference/ingest/destination-connector/singlestore",
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: Redis
3+
---
4+
5+
<NewDocument />
6+
7+
import SharedContentRedis from '/snippets/dc-shared-text/redis-cli-api.mdx';
8+
9+
<SharedContentRedis/>
10+
11+
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported.
12+
13+
This example uses the local source connector.
14+
15+
This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.
16+
17+
import RedisAPISh from '/snippets/destination_connectors/redis.sh.mdx';
18+
import RedisAPIPyV2 from '/snippets/destination_connectors/redis.v2.py.mdx';
19+
20+
<CodeGroup>
21+
<RedisAPISh />
22+
<RedisAPIPyV2 />
23+
</CodeGroup>
24+
25+
import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';
26+
27+
<SharedPartitionByAPIOSS/>
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Batch process all your records to store structured outputs in Redis.
2+
3+
The requirements are as follows.
4+
5+
import SharedRedis from '/snippets/general-shared-text/redis.mdx';
6+
import SharedRedisCLIAPI from '/snippets/general-shared-text/redis-cli-api.mdx';
7+
8+
<SharedRedis />
9+
<SharedRedisCLIAPI />
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
```bash CLI
2+
# Chunking and embedding are optional.
3+
4+
# Use a Redis connection string:
5+
unstructured-ingest \
6+
local \
7+
--input-path $LOCAL_FILE_INPUT_DIR \
8+
--chunking-strategy by_title \
9+
--embedding-provider huggingface \
10+
--partition-by-api \
11+
--api-key $UNSTRUCTURED_API_KEY \
12+
--partition-endpoint $UNSTRUCTURED_API_URL \
13+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
14+
redis \
15+
--uri $REDIS_URI
16+
17+
# Use Redis connection properties:
18+
unstructured-ingest \
19+
local \
20+
--input-path $LOCAL_FILE_INPUT_DIR \
21+
--chunking-strategy by_title \
22+
--embedding-provider huggingface \
23+
--partition-by-api \
24+
--api-key $UNSTRUCTURED_API_KEY \
25+
--partition-endpoint $UNSTRUCTURED_API_URL \
26+
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
27+
redis \
28+
--host $REDIS_HOST \
29+
--port 14453
30+
--database 0 \
31+
--username $REDIS_USERNAME \
32+
--password $REDIS_PASSWORD \
33+
--no-ssl
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
```python Python Ingest v2
2+
import os
3+
4+
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
5+
from unstructured_ingest.v2.interfaces import ProcessorConfig
6+
7+
from unstructured_ingest.v2.processes.connectors.redisdb import (
8+
RedisAccessConfig,
9+
RedisConnectionConfig,
10+
RedisUploaderConfig
11+
)
12+
from unstructured_ingest.v2.processes.connectors.local import (
13+
LocalIndexerConfig,
14+
LocalConnectionConfig,
15+
LocalDownloaderConfig
16+
)
17+
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
18+
from unstructured_ingest.v2.processes.chunker import ChunkerConfig
19+
from unstructured_ingest.v2.processes.embedder import EmbedderConfig
20+
21+
# Chunking and embedding are optional.
22+
23+
if __name__ == "__main__":
24+
Pipeline.from_configs(
25+
context=ProcessorConfig(),
26+
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
27+
downloader_config=LocalDownloaderConfig(),
28+
source_connection_config=LocalConnectionConfig(),
29+
partitioner_config=PartitionerConfig(
30+
partition_by_api=True,
31+
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
32+
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
33+
additional_partition_args={
34+
"split_pdf_page": True,
35+
"split_pdf_allow_failed": True,
36+
"split_pdf_concurrency_level": 15
37+
}
38+
),
39+
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
40+
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
41+
42+
# Use a Redis connection string.
43+
# destination_connection_config=RedisConnectionConfig(
44+
# access_config=RedisAccessConfig(
45+
# uri = os.getenv("REDIS_URI")
46+
# )
47+
# ),
48+
49+
# Use Redis connection properties.
50+
destination_connection_config=RedisConnectionConfig(
51+
access_config=RedisAccessConfig(
52+
password=os.getenv("REDIS_PASSWORD")
53+
),
54+
host=os.getenv("REDIS_HOST"),
55+
database=int(os.getenv("REDIS_DATABASE")),
56+
port=int(os.getenv("REDIS_PORT")),
57+
username=os.getenv("REDIS_USERNAME"),
58+
ssl=False
59+
),
60+
uploader_config=RedisUploaderConfig(batch_size=100)
61+
).run()
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
The Redis connector dependencies.
2+
3+
```bash
4+
pip install "unstructured-ingest[redis]"
5+
```
6+
7+
import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';
8+
9+
<AdditionalIngestDependencies />
10+
11+
The following environment variables:
12+
13+
- For connecting with a Redis connection string, `REDIS_URI`, represented by `--uri` (CLI) or `uri` (Python). Redis connection strings use the following format:
14+
15+
```
16+
<protocol>://<username>:<password>@<hostname>:<port>?ssl=<true|false>&db=<db_number>
17+
```
18+
19+
- For connecting with Redis connection properties:
20+
21+
- `REDIS_HOST` - The hostname of the target Redis database, represented by `--host` (CLI) or `host` (Python).
22+
- `REDIS_PORT` - The database's port number, represented by `--port` (CLI) or `port` (Python).
23+
- `REDIS_DATABASE` - The database number of the target database, represented by `--database` (CLI) or `database` (Python).
24+
- `REDIS_USERNAME` - The username for the database, represented by `--username` (CLI) or `username` (Python).
25+
- `REDIS_PASSWORD` - The user's password, represented by `--password` (CLI) or `password` (Python).
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- A [Redis](https://redis.io) database, for example in [Redis Cloud](https://redis.io/cloud/).
2+
- The target database's hostname and port number. [Create a database in Redis Cloud](https://redis.io/docs/latest/operate/rc/rc-quickstart/#create-an-account).
3+
- The username and password for the target database. [Get the username and password in Redis Cloud](https://redis.io/docs/latest/operate/rc/rc-quickstart/#connect-to-a-database).
4+
- The database number for the target database. Redis databases are typically numbered from 0 to 15, with the default database number typically being 0.

0 commit comments

Comments
 (0)