Skip to content

Commit 897bfc9

Browse files
authored
Astra DB destination connector for Platform (#293)
1 parent 27b6472 commit 897bfc9

File tree

5 files changed

+50
-93
lines changed

5 files changed

+50
-93
lines changed

mint.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,7 @@
445445
"group": "Destinations",
446446
"pages": [
447447
"platform/destinations/overview",
448+
"platform/destinations/astradb",
448449
"platform/destinations/azure-cognitive-search",
449450
"platform/destinations/milvus",
450451
"platform/destinations/mongodb",

platform/connectors.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ If your source is not listed here, you might still be able to connect Unstructur
2121

2222
## Destinations
2323

24+
- [Astra DB](/platform/destinations/astradb)
2425
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
2526
- [Milvus](/platform/destinations/milvus)
2627
- [MongoDB](/platform/destinations/mongodb)

platform/destinations/astradb.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Astra DB
3+
---
4+
5+
Send processed data from Unstructured to Astra DB.
6+
7+
You'll need:
8+
9+
import AstraDBPrerequisites from '/snippets/general-shared-text/astradb.mdx';
10+
11+
<AstraDBPrerequisites />
12+
13+
To create the destination connector:
14+
15+
1. On the sidebar, click **Destinations**.
16+
2. Click **New Destination**.
17+
3. In the **Type** drop-down list, select **Astra DB**.
18+
4. Fill in the fields as described later on this page.
19+
5. Click **Save and Test**.
20+
6. Click **Close**.
21+
22+
import AstraDBFields from '/snippets/general-shared-text/astradb-platform.mdx';
23+
24+
<AstraDBFields />

platform/overview.mdx

Lines changed: 17 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,101 +1,25 @@
11
---
2-
title: Unstructured Platform
3-
sidebarTitle: Overview
2+
title: Overview
3+
description: Destination connectors in the Unstructured Platform are designed to specify the endpoint for data processed within the platform. These connectors ensure that the transformed and analyzed data is securely and efficiently transferred to a storage system for future use, often to a vector database for tasks that involve high-speed retrieval and advanced data analytics operations.
44
---
55

6-
<Note>To start using the Unstructured Platform right away, skip ahead to the [quickstart](/platform/quickstart).</Note>
6+
![Destinations in the sidebar](/img/platform/Destinations-Sidebar.png)
77

8-
## What is the Unstructured Platform?
8+
To see your existing destination connectors, on the sidebar, click **Destinations**.
99

10-
<iframe
11-
width="560"
12-
height="315"
13-
src="https://www.youtube.com/embed/_mxLMykpFJ0"
14-
title="YouTube video player"
15-
frameborder="0"
16-
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
17-
allowfullscreen
18-
></iframe>
19-
20-
The Unstructured Platform is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
21-
22-
## How does it work?
23-
24-
To get your data RAG-ready, the Unstructured Platform moves it through the following process:
25-
26-
```mermaid
27-
flowchart LR
28-
Connect[1. Connect]-->Route[2. Route]-->Transform[3. Transform]-->Chunk[4. Chunk]-->Enrich[5. Enrich]-->Embed[6. Embed]-->Persist[7. Persist]
29-
```
30-
<Steps>
31-
<Step title="Connect">
32-
The Unstructured Platform offers multiple [source connectors](/platform/sources/overview) to connect to your data in its existing location.
33-
</Step>
34-
<Step title="Route">
35-
Routing determines which strategy Unstructured Platform uses to transforming your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides these [partitioning](/platform/partitioning) strategies for document transformation:
36-
37-
- **Fast** is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format.
38-
- **Hi Res** is best for PDFs and tables and where accurate classification of document elements is critical.
39-
- If you're unsure which strategy to use, choose **Auto**, and the Unstructured Platform will handle the decision for you.
40-
41-
</Step>
42-
<Step title="Transform">
43-
Your source document is transformed into Unstructured's canonical JSON schema. Regardless of the input document, this JSON schema we gives you a [standardized output](/platform/document-elements). It contains more than 20 elements, such as `Header`, `Footer`, `Title`, `NarrativeText`, `Table`, `Image`, and many more. Each document is wrapped in extensive metadata so you can understand languages, file types, sources, hierarchies, and much more.
44-
</Step>
45-
<Step title="Chunk">
46-
The Unstructured Platform provides these [chunking](/platform/chunking) strategies:
47-
48-
- **Basic** combines sequential elements up to specified size limits. Oversized elements are split, while tables are isolated and divided if necessary. Overlap between chunks is optional.
49-
- **By Title** uses semantic chunking, understands the layout of the document, and makes intelligent splits.
50-
- **By Page** attempts to preserve page boundaries when determining the chunks' contents.
51-
- **By Similarity** uses an embedding model to identify topically similar sequential elements and combines them into chunks.
52-
53-
</Step>
54-
<Step title="Enrich">
55-
Images and tables can be optionally summarized. This generates enriched content around the images or tables that were parsed during the transformation process.
56-
</Step>
57-
<Step title="Embed">
58-
The Unstructured Platform uses optional third-party [embedding](/platform/embedding) providers such as OpenAI.
59-
</Step>
60-
<Step title="Persist">
61-
The Unstructured Platform offers multiple [destination connectors](/platform/destinations/overview), including all major vector databases.
62-
</Step>
63-
</Steps>
64-
65-
To simplify this process and provide it as a no-code solution, the Unstructured Platform brings together four key concepts:
66-
67-
```mermaid
68-
flowchart LR
69-
subgraph Workflow[3. Workflow]
70-
direction LR
71-
Source[1. Source Connector] --> Destination[2. Destination Connector]
72-
end
73-
Jobs
74-
Workflow[3. Workflow] --> Jobs[4. Jobs]
75-
```
76-
77-
<Steps>
78-
<Step title="Source Connector">
79-
[Source connectors](/platform/sources/overview) to ingest your data into the Unstructured Platform for transformation.
80-
</Step>
81-
<Step title="Destination Connector">
82-
[Destination connectors](/platform/destinations/overview) tell the Unstructured Platform where to write your transformed data to.
83-
</Step>
84-
<Step title="Workflow">
85-
[Workflows](/platform/workflows) connect sources to destinations and provide chunking, embedding, and scheduling options.
86-
</Step>
87-
<Step title="Jobs">
88-
[Jobs](/platform/jobs) enable you to monitor data transformation progress.
89-
</Step>
90-
</Steps>
91-
92-
## What support is there for compliance?
93-
94-
The platform is designed for global reach with SOC2 Type 1, SOC2 Type 2, and HIPAA compliance. It has support for over 50 languages.
95-
96-
## How do I get started?
97-
98-
Skip ahead to the [quickstart](/platform/quickstart).
10+
To create a destination connector:
9911

12+
1. On the sidebar, click **Destinations**.
13+
2. Click **New Destination**.
14+
3. In the **Type** drop-down list, select the connector type that matches your destination.
15+
4. Fill in the fields according to your connector type. To learn how, click your connector type in the following list:
10016

17+
- [Astra DB](/platform/destinations/astradb)
18+
- [Azure Cognitive Search](/platform/destinations/azure-cognitive-search)
19+
- [Milvus](/platform/destinations/milvus)
20+
- [MongoDB](/platform/destinations/mongodb)
21+
- [Pinecone](/platform/destinations/pinecone)
22+
- [S3](/platform/destinations/s3)
10123

24+
5. Click **Save and Test**.
25+
6. Click **Close**.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Fill in the following fields:
2+
3+
- **Name** (_required_): A unique name for this connector.
4+
- **Token** (_required_): The application token for the database.
5+
- **API Endpoint** (_required_): The database's associated API endpoint.
6+
- **Collection Name** (_required_): The name of the collection in the namespace.
7+
- **Embedding Dimension** (_required_): The number of dimensions in the collection.

0 commit comments

Comments
 (0)