Skip to content

Commit 041712d

Browse files
authored
RAG and agentic AI (#688)
1 parent b255835 commit 041712d

File tree

11 files changed

+12
-12
lines changed

11 files changed

+12
-12
lines changed

api-reference/partition/chunking.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Chunking strategies
33
---
44

55
Chunking functions use metadata and document elements detected with partition functions to split a document into
6-
appropriately-sized chunks for uses cases such as Retrieval Augmented Generation (RAG).
6+
appropriately-sized chunks for uses cases such as retrieval-augmented generation (RAG).
77

88
If you are familiar with chunking methods that split long text documents into smaller chunks, you'll notice that
99
Unstructured methods slightly differ, since the partitioning step already divides an entire document into its structural elements.

api-reference/workflow/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Overview
33
---
44

55
The [Unstructured UI](/ui/overview) features a no-code user interface for transforming your unstructured data into data that is ready
6-
for Retrieval Augmented Generation (RAG).
6+
for retrieval-augmented generation (RAG).
77

88
The Unstructured Workflow Endpoint, part of the [Unstructured API](/api-reference/overview), enables a full range of partitioning, chunking, embedding, and
99
enrichment options for your files and data. It is designed to batch-process files and data in remote locations; send processed results to

open-source/core-functionality/chunking.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Chunking
3-
description: Chunking functions in `unstructured` use metadata and document elements detected with `partition` functions to post-process elements into more useful "chunks" for uses cases such as Retrieval Augmented Generation (RAG).
3+
description: Chunking functions in `unstructured` use metadata and document elements detected with `partition` functions to post-process elements into more useful "chunks" for uses cases such as retrieval-augmented generation (RAG).
44
---
55

66
## Chunking Basics

open-source/core-functionality/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ After reading this section, you should understand the following:
1414

1515
* How to prepare data for downstream use cases using staging functions
1616

17-
* How to chunk partitioned documents for use cases such as Retrieval Augmented Generation (RAG).
17+
* How to chunk partitioned documents for use cases such as retrieval-augmented generation (RAG).

open-source/how-to/embedding.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ These vectors are stored or _embedded_ next to the data itself.
2121

2222
These vector embeddings allow _vector databases_ to more quickly and efficiently analyze and process these inherent
2323
properties and relationships between data. For example, you can save the extracted text along with its embeddings in a _vector store_.
24-
When a user queries a retrieval augmented generation (RAG) application, the application can use a vector database to perform a similarity search in that vector store
24+
When a user queries a retrieval-augmented generation (RAG) application, the application can use a vector database to perform a similarity search in that vector store
2525
and then return the documents whose embeddings are the closest to that user's query.
2626

2727
Learn more about [chunking](https://unstructured.io/blog/chunking-for-rag-best-practices) and

open-source/introduction/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ and use cases.
3535

3636
* Pretraining models
3737
* Fine-tuning models
38-
* Retrieval Augmented Generation (RAG)
38+
* Retrieval-augmented generation (RAG)
3939
* Traditional ETL
4040

4141
<Note>GPU usage is not supported for the Unstructured open source library.</Note>

snippets/concepts/glossary.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ High-level overview of available strategies and models in `Unstructured` library
3636

3737
LLMs, like GPT, are trained on vast amounts of data and can comprehend and generate human-like text. They have achieved state-of-the-art results across many NLP tasks and can be fine-tuned to cater to specific domains or requirements.
3838

39-
## Retrieval augmented generation (RAG)
39+
## Retrieval-augmented generation (RAG)
4040

4141
Large language models (LLMs) like OpenAI’s ChatGPT and Anthropic’s Claude have revolutionized the AI landscape with their prowess. However, they inherently suffer from significant drawbacks. One major issue is their static nature, which means they’re “frozen in time.” Despite this, LLMs might often respond to newer queries with unwarranted confidence, a phenomenon known as “hallucination.” Such errors can be highly detrimental, mainly when these models serve critical real-world applications.
4242

43-
Retrieval augmented generation (RAG) is a groundbreaking technique designed to counteract the limitations of foundational LLMs. By pairing an LLM with an RAG pipeline, we can enable users to access the underlying data sources that the model uses. This transparent approach ensures that an LLM’s claims can be verified for accuracy and builds a trust factor among users.
43+
Retrieval-augmented generation (RAG) is a groundbreaking technique designed to counteract the limitations of foundational LLMs. By pairing an LLM with an RAG pipeline, we can enable users to access the underlying data sources that the model uses. This transparent approach ensures that an LLM’s claims can be verified for accuracy and builds a trust factor among users.
4444

4545
Moreover, RAG offers a cost-effective solution. Instead of bearing the extensive computational and financial burdens of training custom models or fine-tuning existing ones, RAG can, in many situations, serve as a sufficient alternative. This reduction in resource consumption is particularly beneficial for organizations that need more means to develop and deploy foundational models from scratch.
4646

snippets/quickstarts/single-file-ui.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enric
132132
allowfullscreen
133133
></iframe>
134134

135-
- Add a **Chunker** node after the **Partitioner** node, to chunk the partitioned data into smaller pieces for your retrieval augmented generation (RAG) applications.
135+
- Add a **Chunker** node after the **Partitioner** node, to chunk the partitioned data into smaller pieces for your retrieval-augmented generation (RAG) applications.
136136
To do this, click the add (**+**) button to the right of the **Partitioner** node, and then click **Enrich > Chunker**. Click the new **Chunker** node and
137137
specify its settings. For help, click the **FAQ** button in the **Chunker** node's pane. [Learn more about chunking and chunker settings](/ui/chunking).
138138
- Add an **Enrichment** node after the **Chunker** node, to apply enrichments to the chunked data such as image summaries, table summaries, table-to-HTML transforms, and

ui/embedding.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ These vectors are stored or _embedded_ next to the text itself. These vector emb
99
an _embedding provider_.
1010

1111
You typically save these embeddings in a _vector store_.
12-
When a user queries a retrieval augmented generation (RAG) application, the application can use a vector database to perform
12+
When a user queries a retrieval-augmented generation (RAG) application, the application can use a vector database to perform
1313
a [similarity search](https://www.pinecone.io/learn/what-is-similarity-search/) in that vector store
1414
and then return the items whose embeddings are the closest to that user's query.
1515

ui/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Overview
33
---
44

5-
The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for Retrieval Augmented Generation (RAG).
5+
The Unstructured user interface (UI) is a no-code user interface, pay-as-you-go platform for transforming your unstructured data into data that is ready for retrieval-augmented generation (RAG).
66

77
<Tip>To start using the Unstructured UI right away, skip ahead to the [quickstart](/ui/quickstart).</Tip>
88

0 commit comments

Comments
 (0)