|
1 | 1 | This quickstart uses the Unstructured Python SDK to call the Unstructured Workflow Endpoint to get your data RAG-ready. The Python code for this |
2 | 2 | quickstart is in a remote hosted Google Colab notebook. Data is processed on Unstructured-hosted compute resources. |
3 | 3 |
|
4 | | -The requirements are as follows: |
5 | | - |
6 | | -- A compatible source (input) location that contains your data for Unstructured to process. [See the list of supported source types](/ui/connectors#sources). |
7 | | - This quickstart uses an Amazon S3 bucket as the source location. If you use a different source type, you will need to modify the quickstart notebook accordingly. |
8 | | -- For document-based source locations, compatible files in that location. [See the list of supported file types](/ui/supported-file-types). If you do not have any files available, you can download some from the [example-docs](https://github.com/Unstructured-IO/unstructured-ingest/tree/main/example-docs) folder in the `Unstructured-IO/unstructured-ingest` repository in GitHub. |
9 | | -- A compatible destination (output) location for Unstructured to put the processed data. [See the list of supported destination types](/ui/connectors#destinations). |
10 | | - For this quickstart's destination location, a different folder in the same Amazon S3 bucket as the source location is used. If you use a different destination S3 bucket or a different destination type, you will need to modify the quickstart notebook accordingly. |
11 | | - |
12 | | -import GetStartedSimpleAPIOnly from '/snippets/general-shared-text/get-started-simple-api-only.mdx'; |
13 | | - |
14 | | -<Steps> |
15 | | - <Step title="Sign up, sign in, and get your API key"> |
16 | | - <GetStartedSimpleAPIOnly /> |
17 | | - </Step> |
18 | | - <Step title="Create and set up the S3 bucket"> |
19 | | - This quickstart uses an Amazon S3 bucket as both the source location and the destination location. |
20 | | - (You can use other source and destination types that are supported by Unstructured. |
21 | | - If you use a different source or destination type, or if you use a different S3 bucket for the destination location, |
22 | | - you will need to modify the quickstart notebook accordingly.) |
23 | | - |
24 | | - Inside of the S3 bucket, a folder named `input` represents the |
25 | | - source location. This is where your files to be processed will be stored. |
26 | | - The S3 URI to the source location will be `s3://<your-bucket-name>/input`. |
27 | | - |
28 | | - Inside of the same S3 bucket, a folder inside named `output` represents the destination location. This |
29 | | - is where Unstructured will put the processed data. |
30 | | - The S3 URI to the destination location will be `s3://<your-bucket-name>/output`. |
31 | | - |
32 | | - Learn how to [create an S3 bucket and set it up for Unstructured](/api-reference/workflow/sources/s3). (Do not run the Python SDK code or REST commands at the end of those setup instructions.) |
33 | | - </Step> |
34 | | - <Step title="Run the quickstart notebook"> |
35 | | - After your S3 bucket is created and set up, follow the instructions in this [quickstart notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Platform_Workflow_Endpoint_Quickstart.ipynb). |
36 | | - </Step> |
37 | | - <Step title="View the processed data"> |
38 | | - After you run the quickstart notebook, go to your destination location to view the processed data. |
39 | | - </Step> |
40 | | -</Steps> |
| 4 | +To run this quickstart, open the [notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Dropbox_To_Pinecone_Connector_Quickstart.ipynb) and begin following the notebook's on-screen instructions. |
0 commit comments