You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Open the related [notebook](https://colab.research.google.com/drive/1rJOZYZfsTQ_JV2hXaY4kgYvbA7xEWBZn?usp=sharing) that is shown in the preceding video.
18
+
Open the related [notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_API_Partition_endpoint.ipynb) that is shown in the preceding video.
19
19
20
20
To make POST requests to the Unstructured Partition Endpoint, you will need:
Open a related [notebook](https://colab.research.google.com/drive/13f5C9WtUvIPjwJzxyOR3pNJ9K9vnF4ww?usp=sharing) that covers many of
58
+
Open a related [notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Platform_Workflow_Endpoint_Quickstart.ipynb) that covers many of
59
59
the concepts that are shown in the preceding videos.
60
60
61
61
The [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client), beginning with version 0.30.6,
Copy file name to clipboardExpand all lines: examplecode/notebooks.mdx
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,85 +6,85 @@ description: "Notebooks contain complete working sample code for end-to-end solu
6
6
---
7
7
8
8
<CardGroupcols={2}>
9
-
<Cardtitle="Getting Started with Unstructured API and IBM watsonx.data"href="https://colab.research.google.com/drive/1RB5ICOXNGo8xbxxcUkPHkSV8rlMKOLAn?usp=sharing">
9
+
<Cardtitle="Getting Started with Unstructured API and IBM watsonx.data"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Azure_to_IBM_WatsonX.ipynb">
10
10
<br/>
11
11
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your IBM watsonx.data instance.
<Cardtitle="Using Unstructured with Snowflake Cortex Search for RAG"href="https://colab.research.google.com/drive/17sWi10DoTNVEkYyEzc-ztEtIxBZHPczp?usp=sharing">
16
+
<Cardtitle="Using Unstructured with Snowflake Cortex Search for RAG"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Use_Unstructured_with_Snowflake_Cortex_for_RAG_Search.ipynb">
17
17
<br/>
18
18
Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
<Cardtitle="Agentic RAG with LangGraph and Together AI"href="https://colab.research.google.com/drive/16JYOV3JwP2PJpQx-PGFshC64Cf91G7D1?usp=sharing">
23
+
<Cardtitle="Agentic RAG with LangGraph and Together AI"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/AgenticRAG_with_LangGraph,TogetherAI.ipynb">
24
24
<br/>
25
25
Build Agentic RAG with `LangGraph` and `Together AI` and compare the results with Vanilla RAG in pure Python
<Cardtitle="Getting Started with Unstructured API and Snowflake"href="https://colab.research.google.com/drive/1Q7TClKZP7U3d2zLucqJgE-Es7sGD6-bX?usp=sharing">
30
+
<Cardtitle="Getting Started with Unstructured API and Snowflake"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Snowflake.ipynb">
31
31
<br/>
32
32
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
<Cardtitle="Getting Started with Unstructured API and Delta Tables in Databricks"href="https://colab.research.google.com/drive/1ujLZoLx0ai0GAjvr9Xmze4UReZjKa8cU?usp=sharing">
44
+
<Cardtitle="Getting Started with Unstructured API and Delta Tables in Databricks"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Getting_Started_with_Unstructured_API_and_Delta_Tables_in_Databricks.ipynb">
45
45
<br/>
46
46
Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
<Cardtitle="RAG for Online Documentation"href="https://colab.research.google.com/drive/1F1LkM_HwuwruQb5rwsK5Fj_up15hHeLi?usp=sharing">
51
+
<Cardtitle="RAG for Online Documentation"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_for_documentation.ipynb">
52
52
<br/>
53
53
Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
61
61
<br/>
62
62
``Unstructured API````Workflows````S3``
63
63
<br/>
64
64
</Card>
65
-
<Cardtitle="RAG with Databricks Vector Search with Context from Multiple Sources"href="https://colab.research.google.com/drive/1ZsrqYVhBAqsr6L98xlVjTZltMzNi_P3o?usp=sharing">
65
+
<Cardtitle="RAG with Databricks Vector Search with Context from Multiple Sources"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Delta_Tables_Databricks_Multiple_Sources.ipynb">
66
66
<br/>
67
67
Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
68
68
<br/>
69
69
``Databricks````Introductory notebook``
70
70
<br/>
71
71
</Card>
72
72
73
-
<Cardtitle="Agentic RAG with Hugging Face smolagents vs Vanilla RAG"href="https://colab.research.google.com/drive/1hG3dPgd8wjrO9wSD0K0Feo7EY1iXqrEN?usp=sharing">
73
+
<Cardtitle="Agentic RAG with Hugging Face smolagents vs Vanilla RAG"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Agentic_RAG_with_HuggingFace_smolagents.ipynb">
74
74
<br/>
75
75
Build Agentic RAG with `smolagents` library and compare the results with Vanilla RAG in pure Python
<Cardtitle="Multimodal RAG: Enhancing RAG outputs with image results"href="https://colab.research.google.com/drive/1gBI67HKyepmpAzf0T5yWwBwIqVXK1Ea_?usp=sharing">
87
+
<Cardtitle="Multimodal RAG: Enhancing RAG outputs with image results"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Multimodal_RAG_with_image_results.ipynb">
88
88
<br/>
89
89
Process a file in S3 with Unstructured and return images in your RAG output
90
90
<br/>
@@ -98,93 +98,93 @@ description: "Notebooks contain complete working sample code for end-to-end solu
98
98
``Unstructured API````Hex````Advanced notebook``
99
99
<br/>
100
100
</Card>
101
-
<Cardtitle="PII removal with GLiNER in unstructured data ETL"href="https://colab.research.google.com/drive/1HwOMnGjrNbcHZ1vlhaAG0MSDBcwQfexF?usp=sharing">
101
+
<Cardtitle="PII removal with GLiNER in unstructured data ETL"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PII_removal_in_unstructured_data_ETL.ipynb">
102
102
<br/>
103
103
Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
<Cardtitle="Selecting an embedding model for custom data"href="https://colab.research.google.com/drive/132oXSGSOyzZ7GO9pJhRKlvwY4F-i9Pm6?usp=sharing">
115
+
<Cardtitle="Selecting an embedding model for custom data"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Selecting_an_embedding_model_for_custom_data.ipynb">
116
116
<br/>
117
117
End-to-end data processing pipeline using Unstructured Serverless API.
<Cardtitle="RAG with PDFs, LangChain and Llama 3"href="https://colab.research.google.com/drive/1BJYYyrPVe0_9EGyXqeNyzmVZDrCRZwsg?usp=sharing">
122
+
<Cardtitle="RAG with PDFs, LangChain and Llama 3"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_Llama3_Unstructured_LangChain.ipynb">
123
123
<br/>
124
124
A RAG system with the Llama 3 model from Hugging Face.
<Cardtitle="Unstructured data ETL from S3 to SingleStore DB"href="https://colab.research.google.com/drive/1Krvn5XlYNERQe7DNIXKEz3AmESJdABLF?usp=sharing">
128
+
<Cardtitle="Unstructured data ETL from S3 to SingleStore DB"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_data_ETL_from_S3_to_SingleStore.ipynb">
129
129
<br/>
130
130
Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
<Cardtitle="Preprocess PDFs in AWS S3, load into Elasticsearch"href="https://colab.research.google.com/drive/1axADo7T_dMkeOWnZ5C4dKve16-wrtQuV?usp=sharing">
148
+
<Cardtitle="Preprocess PDFs in AWS S3, load into Elasticsearch"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/S3_to_Elasticsearch_with_Unstructured.ipynb">
149
149
<br/>
150
150
Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
<Cardtitle="Preprocess documents in Google Drive, load into Databricks Volume"href="https://colab.research.google.com/drive/1gVd03geFUD_OTROMuhjVAHYQvgVbViq7?usp=sharing">
155
+
<Cardtitle="Preprocess documents in Google Drive, load into Databricks Volume"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/GoogleDrive_to_Databricks_Connector.ipynb">
156
156
<br/>
157
157
Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
<Cardtitle="Source references in RAG responses"href="https://colab.research.google.com/drive/1Lc8eq8P87JjzUhbYb33_c7h7njsWb-hn?usp=sharing">
162
+
<Cardtitle="Source references in RAG responses"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_on_arXiv_papers_with_source_references.ipynb">
163
163
<br/>
164
164
Add document source references to RAG responses based on documents metadata.
<Cardtitle="Query processed PDF with HuggingChat"href="https://colab.research.google.com/drive/1rNVVX5qo7vyBwR7wTa-zS6lDMkKpTei0?usp=sharing">
169
+
<Cardtitle="Query processed PDF with HuggingChat"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/PDF_with_Unstructured_and_HuggingChat.ipynb">
170
170
<br/>
171
171
Send a PDF to Unstructured for processing, and send a subset of the returned PDF's processed text to [HuggingChat](https://huggingface.co/chat/) for chatbot-style querying.
<Cardtitle="Llama 3 Local RAG with emails"href="https://colab.research.google.com/drive/1ieDJ4LoxARrHFqxXWif8Lv8e8aZTgmtH?usp=sharing">
175
+
<Cardtitle="Llama 3 Local RAG with emails"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Local_RAG_with_emails.ipynb">
176
176
<br/>
177
177
Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
<Cardtitle="Building RAG With PowerPoint presentations"href="https://colab.research.google.com/drive/1l_e7CyqfBUxBBDc6E6XIKtKV-8YWvmRX?usp=sharing">
181
+
<Cardtitle="Building RAG With PowerPoint presentations"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Building_RAG_with_Powerpoint_presentations.ipynb">
<Cardtitle="Synthetic test dataset generation"href="https://colab.research.google.com/drive/1VvOauC46xXeZrhh8nlTyv77yvoroMQjr?usp=sharing">
187
+
<Cardtitle="Synthetic test dataset generation"href="https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/RAG_synthetic_test_data_with_Unstructured_GPT_4o_and_Ragas.ipynb">
188
188
<br/>
189
189
Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Copy file name to clipboardExpand all lines: open-source/introduction/quick-start.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,7 +211,7 @@ import SharedOSSSingleFile from '/snippets/general-shared-text/multi-file-oss-us
211
211
- Learn about [available cleaning functions](/open-source/core-functionality/cleaning) for cleaning up your document elements' data as needed.
212
212
- Learn about [available extraction functions](/open-source/core-functionality/extracting) for getting precise information out of your document elements as needed.
213
213
- Learn about how to [generate vector embeddings](/open-source/core-functionality/embedding) for the text in your document elements for use in RAG applications, AI agents, model fine-tuning tasks, and more.
214
-
- For an additional code example, see the [Unstructured Quick Tour](https://colab.research.google.com/drive/1U8VCjY2-x8c6y5TYMbSFtQGlQVFHCVIW) Google Colab notebook.
214
+
- For an additional code example, see the [Unstructured Quick Tour](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Quick_Tour.ipynb) Google Colab notebook.
215
215
- The Unstructured open source library is also available as a [Docker container](/open-source/installation/docker-installation).
216
216
- The [Unstructured Ingest CLI and Unstructured Ingest Python library](/ingestion/overview) build upon the Unstructured open source library by providing additional functionality such as batch file processing,
217
217
ingesting files from remote source locations and sending the processed files' data to remote destination locations, creating programmatic ETL pipelines, optionally processing files on Unstructured-hosted compute resource instead of locally for improved performance and quality on a pay-as-you-go basis, and more.
go directly to the [quickstart notebook](https://colab.research.google.com/drive/13f5C9WtUvIPjwJzxyOR3pNJ9K9vnF4ww),
17
+
go directly to the [quickstart notebook](https://colab.research.google.com/github/Unstructured-IO/notebooks/blob/main/notebooks/Unstructured_Platform_Workflow_Endpoint_Quickstart.ipynb),
18
18
or watch the two 4-minute video tutorials for the [Unstructured Python SDK](/api-reference/workflow/overview#unstructured-python-sdk).
19
19
20
20
You can also create destination connectors with the Unstructured user interface (UI).
0 commit comments