Skip to content

Commit c644e9e

Browse files
authored
Merge pull request #261319 from laujan/lu-pr-rag-concept-260570
Lu pr rag concept 260570
2 parents 380caef + 6acad72 commit c644e9e

File tree

7 files changed

+164
-1
lines changed

7 files changed

+164
-1
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
title: Retrieval-Augmented Generation (RAG) with Azure AI Document Intelligence (formerly Form Recognizer)
3+
titleSuffix: Azure AI services
4+
description: Introduction to how semantic chunking can help with Retrieval-Augmented Generation (RAG) implementation using Azure AI Document Intelligence Layout model.
5+
author: laujan
6+
manager: nitinme
7+
ms.service: azure-ai-document-intelligence
8+
ms.topic: conceptual
9+
ms.date: 12/15/2023
10+
ms.author: luzhan
11+
monikerRange: '>=doc-intel-3.0.0'
12+
---
13+
14+
# Retrieval-Augmented Generation with Azure AI Document Intelligence
15+
16+
<!-- markdownlint-disable MD036 -->
17+
18+
**This content applies to:** ![checkmark](media/yes-icon.png) **v4.0 (preview)**
19+
20+
## Introduction
21+
22+
Retrieval-Augmented Generation (RAG) is a document generative AI solution that combines a pretrained Large Language Model (LLM) like ChatGPT with an external data retrieval system to generate an enhanced response incorporating new data outside of the original training data. Adding an information retrieval system to your applications enables you to chat with your documents, generate captivating content, and access the power of Azure OpenAI models for your data. You also have more control over the data used by the LLM as it formulates a response.
23+
24+
The Document Intelligence [Layout model](concept-layout.md) is an advanced machine-learning based document analysis API. With semantic chunking, the Layout model offers a comprehensive solution for advanced content extraction and document structure analysis capabilities. With the Layout model, you can easily extract text and structural to divide large bodies of text into smaller, meaningful chunks based on semantic content rather than arbitrary splits. The extracted information can be conveniently outputted to Markdown format, enabling you to define your semantic chunking strategy based on the provided building blocks.
25+
26+
:::image type="content" source="media/rag/azure-rag-processing.png" alt-text="Screenshot depicting semantic chunking with RAG using Azure AI Document Intelligence.":::
27+
28+
## Semantic chunking
29+
30+
Long sentences are challenging for natural language processing (NLP) applications. Especially when they're composed of multiple clauses, complex noun or verb phrases, relative clauses, and parenthetical groupings. Just like the human beholder, an NLP system also needs to successfully keep track of all the presented dependencies. The goal of semantic chunking is to find semantically coherent fragments of a sentence representation. These fragments can then be processed independently and recombined as semantic representations without loss of information, interpretation, or semantic relevance. The inherent meaning of the text is used as a guide for the chunking process.
31+
32+
Text data chunking strategies play a key role in optimizing the RAG response and performance. Fixed-sized and semantic are two distinct chunking methods:
33+
34+
* **Fixed-sized chunking**. Most chunking strategies used in RAG today are based on fix-sized text segments known as chunks. Fixed-sized chunking is quick, easy, and effective with text that doesn't have a strong semantic structure such as logs and data. However it isn't recommended for text that requires semantic understanding and precise context. The fixed-size nature of the window can result in severing words, sentences, or paragraphs impeding comprehension and disrupt the flow of information and understanding.
35+
36+
* **Semantic chunking**. This method divides the text into chunks based on semantic understanding. Division boundaries are focused on sentence subject and use significant computational algorithmically complex resources. However, it has the distinct advantage of maintaining semantic consistency within each chunk. It's useful for text summarization, sentiment analysis, and document classification tasks.
37+
38+
## Semantic chunking with Document Intelligence Layout model
39+
40+
Markdown is a structured and formatted markup language and a popular input for enabling semantic chunking in RAG (Retrieval-Augmented Generation). You can use the Markdown content from the [Layout model](concept-layout.md) to split documents based on paragraph boundaries, create specific chunks for tables, and fine-tune your chunking strategy to improve the quality of the generated responses.
41+
42+
### Benefits of using the Layout model
43+
44+
* **Simplified processing**. You can parse different document types, such as digital and scanned PDFs, images, office files (docx, xlsx, pptx), and HTML, with just a single API call.
45+
46+
* **Scalability and AI quality**. The Layout model is highly scalable in Optical Character Recognition (OCR), table extraction, and [document structure analysis](concept-layout.md#document-layout-analysis). It supports [309 printed and 12 handwritten languages](language-support-ocr.md#model-id-prebuilt-layout) further ensuring high-quality results driven by AI capabilities.
47+
48+
* **Large learning model (LLM) compatibility**. The Layout model Markdown formatted output is LLM friendly and facilitates seamless integration into your workflows. You can turn any table in a document into Markdown format and avoid extensive effort parsing the documents for greater LLM understanding.
49+
50+
**Text image processed with Document Intelligence Studio using Layout model**
51+
52+
:::image type="content" source="media/rag/markdown-text-output.png" alt-text="Screenshot of newspaper article processed by Layout model and outputted to Markdown.":::
53+
54+
**Table image processed with Document Intelligence Studio using Layout model**
55+
56+
:::image type="content" source="media/rag/markdown-table-output.png" alt-text="Screenshot of table processed by Layout model and outputted to Markdown.":::
57+
58+
## Get started
59+
60+
The Document Intelligence Layout model **2023-10-31-preview** supports the following development options:
61+
62+
* [Document Intelligence Studio](https://documentintelligence.ai.azure.com/studio)
63+
64+
* [REST API](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-10-31-preview&branch=main&tabs=HTTP&preserve-view=true)
65+
66+
* [.NET &bull; Java &bull; JavaScript &bull; Python programming language SDKs.](sdk-overview-v4-0.md#supported-programming-languages)
67+
68+
**Ready to begin?**
69+
70+
### Document Intelligence Studio
71+
72+
You can follow the [Document Intelligence Studio quickstart](quickstarts/try-document-intelligence-studio.md) to get started. Next, you can integrate Document Intelligence features with your own application using the sample code provided.
73+
74+
* Start with the [Layout model](https://documentintelligence.ai.azure.com/studio/layout). You need to select the following **Analyze options** to use RAG in the studio:
75+
76+
**Required**
77+
78+
* Run analysis range → **Current document**
79+
* Page range → **All pages**
80+
* Output format style → **Markdown**
81+
82+
**Optional**
83+
84+
* You can also select relevant optional detection parameters.
85+
86+
* Select **Save**.
87+
88+
:::image type="content" source="media/rag/rag-analyze-options.png" alt-text="Screenshot of Analyze options dialog window with RAG required options in the Document Intelligence studio.":::
89+
90+
* Select the **Run analysis** button to view the output.
91+
92+
:::image type="content" source="media/rag/run-analysis.png" alt-text="Screenshot of the Run Analysis button in the Document Intelligence Studio.":::
93+
94+
### SDK or REST API
95+
96+
* Follow the [Document Intelligence quickstart](quickstarts/get-started-sdks-rest-api.md) for your preferred programming language SDK or REST API. Use the Layout model to extract content and structure from your documents.
97+
98+
* You can also check out GitHub repos for code samples and tips for analyzing a document in markdown output format.
99+
100+
* [Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_analyze_documents_output_in_markdown.py)
101+
102+
* [JavaScript](https://github.com/Azure/azure-sdk-for-js/blob/bb8a2bd8c6dc1883ee7308903b8220eab4b37596/sdk/documentintelligence/ai-document-intelligence-rest/README.md?plain=1#L154)
103+
104+
* [Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/src/samples/java/com/azure/ai/documentintelligence/AnalyzeLayoutMarkdownOutput.java)
105+
106+
## Build document chat with semantic chunking
107+
108+
* [Azure OpenAI on your data](../openai/concepts/use-your-data.md) enables you to run supported chat on your documents. Azure OpenAI on your data applies the Document Intelligence Layout model to extract and parse document data by chunking long text based on tables and paragraphs. You can also customize your chunking strategy using [Azure OpenAI sample scripts](https://github.com/microsoft/sample-app-aoai-chatGPT/tree/main/scripts) located in our GitHub repo.
109+
110+
* Azure AI Document Intelligence is now integrated with [LangChain](https://python.langchain.com/docs/integrations/document_loaders/azure_document_intelligence) as one of its document loaders. You can use it to easily load the data and output to Markdown format. This [notebook](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Quickstart%20-%20Document%20Question%20and%20Answering%20with%20PDFs/) shows a simple demo for RAG pattern with Azure AI Document Intelligence as document loader and Azure Search as retriever in LangChain.
111+
112+
* The chat with your data solution accelerator[code sample](https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator) demonstrates an end-to-end baseline RAG pattern sample. It uses Azure AI Search as a retriever and Azure AI Document Intelligence for document loading and semantic chunking.
113+
114+
## Use case
115+
116+
If you're looking for a specific section in a document, you can use semantic chunking to divide the document into smaller chunks based on the section headers helping you to find the section you're looking for quickly and easily:
117+
118+
```python
119+
120+
# Using SDK targeting 2023-10-31-preview
121+
# pip install azure-ai-documentintelligence==1.0.0b1
122+
# pip install langchain langchain-community azure-ai-documentintelligence
123+
124+
from azure.ai.documentintelligence import DocumentIntelligenceClient
125+
from azure.core.credentials import AzureKeyCredential
126+
127+
endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
128+
credential = AzureKeyCredential("<api_key>")
129+
130+
document_intelligence_client = DocumentIntelligenceClient(
131+
endpoint, credential)
132+
133+
from langchain_community.document_loaders import AzureAIDocumentIntelligenceLoader
134+
135+
from langchain.text_splitter import MarkdownHeaderTextSplitter
136+
137+
# Initiate Azure AI Document Intelligence to load the document. You can either specify file_path or url_path to load the document.
138+
loader = AzureAIDocumentIntelligenceLoader(file_path="<path to your file>", api_key = doc_intelligence_key, api_endpoint = doc_intelligence_endpoint, api_model="prebuilt-layout")
139+
docs = loader.load()
140+
141+
# Split the document into chunks base on markdown headers.
142+
headers_to_split_on = [
143+
("#", "Header 1"),
144+
("##", "Header 2"),
145+
("###", "Header 3"),
146+
]
147+
text_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
148+
149+
docs_string = docs[0].page_content
150+
splits = text_splitter.split_text(docs_string)
151+
splits
152+
```
153+
154+
## Next steps
155+
156+
* Learn more about [Azure AI Document Intelligence](overview.md).
157+
158+
* [Learn how to process your own forms and documents](quickstarts/try-document-intelligence-studio.md) with the [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio).
159+
160+
* Complete a [Document Intelligence quickstart](quickstarts/get-started-sdks-rest-api.md?view=doc-intel-3.1.0&preserve-view=true) and get started creating a document processing app in the development language of your choice.
443 KB
Loading
160 KB
Loading
551 KB
Loading
84.7 KB
Loading
30.7 KB
Loading

articles/ai-services/document-intelligence/toc.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,10 @@ items:
162162
href: concept-layout.md
163163
- name: Add-on capabilities
164164
displayName: extract, formula, font, styles, fontStyle, ocr.highResolution, ocr.formula, high resolution, background color, inline, display
165-
href: concept-add-on-capabilities.md
165+
href: concept-add-on-capabilities.md
166+
- name: Retrieval-Augmentated Generation (RAG)
167+
displayName: RAG, LLM, semantic, chunk, LangChain, language model
168+
href: concept-retrieval-augumented-generation.md
166169
- name: Contract model
167170
displayName: contracts, agreements, legal, terms, conditions, clauses, parties, dates, signatures
168171
href: concept-contract.md

0 commit comments

Comments
 (0)