You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md
+35-26Lines changed: 35 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,14 +16,14 @@ Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilit
16
16
17
17
Azure AI Content Understanding addresses these challenges by offering advanced content extraction capabilities across diverse modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration eliminates the complexities of managing separate extraction pipelines and workflows. A unified approach ensures superior data handling for documents, images, audio, and video, thus enhancing both precision and depth in information retrieval. Such innovation proves especially beneficial for **RAG** applications, where the accuracy and contextual relevance of responses depend on a deep understanding of interconnections, interrelationships, and context.
18
18
19
-
:::image type="content" source="../media/concepts/rag-architecture-1.png" alt-text="screenshot of Azure Content Understanding service architecture.":::
19
+
:::image type="content" source="../media/concepts/rag-architecture-1.png" alt-text="screenshot of Azure Content Understanding service architecture." lightbox="../media/concepts/rag-architecture-1.png" :::
20
20
21
21
## Multimodal data and RAG
22
22
23
23
In traditional content processing, simple text extraction sufficed for many content processing use cases. Modern enterprise environments encompass a vast array of complex information across diverse formats:
24
24
25
25
***Documents** featuring intricate layouts.
26
-
***Images** rich with visual details and insights.
26
+
***Images** rich with visual details and insights.
***Videos** that seamlessly integrate and unify multiple data types.
29
29
@@ -52,22 +52,23 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
52
52
53
53
***Optimized query performance:** Content Understanding mitigates modality bias and context fragmentation by providing structured, enriched data that supports advanced relevance ranking across modalities. This approach ensures that user queries yield the most relevant information, enhancing the coherence and precision of generated responses.
54
54
55
-
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
55
+
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI." lightbox="../media/concepts/rag-architecture-2.png" :::
56
56
57
57
Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
58
58
-**Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
59
-
-**Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
59
+
-**Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
60
60
-**Video:** Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
61
61
62
62
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
63
63
64
64
1.[Extract content](#content-extraction). Convert unstructured multimodal data into a structured representation.
65
65
66
-
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67
-
-**Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
68
-
-**Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
69
-
-**Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
70
-
-**Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
66
+
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67
+
68
+
***Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
69
+
***Image:** Convert visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
70
+
***Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
71
+
***Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
71
72
72
73
1.[Create a unified search index](#create-a-unified-search-index). Store the embedded vectors in a database or search index for efficient retrieval.
73
74
@@ -82,7 +83,7 @@ The RAG implementation process starts with data extraction using Azure AI Conten
82
83
***Audio:** Generates speaker-aware transcriptions that accurately capture spoken content across multiple languages through automatic detection and processing.
83
84
***Video:** Segments video content into meaningful units using scene detection and key frame extraction. It creates descriptive summaries, transcribes spoken dialogue, identifies key topics, and analyzes sentiment indicators throughout the footage. Scene descriptions are provided while addressing context limitations inherent to generative AI models.
84
85
85
-
#### Field Extraction
86
+
#### Field extraction
86
87
87
88
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address specialized domain-specific requirements or deliver deeper contextual insights. Field extraction is a valuable complement to content extraction by producing targeted metadata that enriches the knowledge base and improves retrieval accuracy:
88
89
@@ -94,9 +95,9 @@ While content extraction provides a strong foundation for indexing and retrieval
94
95
95
96
Integrating content extraction with field extraction enables organizations to develop a knowledge base that is context-rich and optimized for indexing, retrieval, and RAG scenarios. This approach enables more precise and relevant responses to user inquiries. To learn more, *see*[content extraction](./capabilities.md#content-extraction) and [field extraction](./capabilities.md#field-extraction) capabilities.
96
97
97
-
#### Code Sample: Analyzer and Schema Configuration
98
+
#### Code sample: analyzer and schema configuration
98
99
99
-
The following code samples show an analyzer and schema creation for various modalities in a multimodal RAG scenario.
100
+
The following code samples show an analyzer and schema creation for various modalities in a multimodal RAG scenario.
100
101
101
102
---
102
103
@@ -219,7 +220,7 @@ The following code samples show an analyzer and schema creation for various moda
219
220
220
221
---
221
222
222
-
#### Code Sample: Extraction Response
223
+
#### Code sample: extraction response
223
224
224
225
The following code sample showcases the results of content and field extraction using Azure AI Content Understanding. These results demonstrate how multimodal data is transformed into structured, enriched formats, ready for indexing and retrieval in RAG workflows.
225
226
@@ -276,12 +277,12 @@ The following code sample showcases the results of content and field extraction
276
277
"words": [
277
278
{
278
279
....
279
-
},
280
+
},
280
281
],
281
282
"lines": [
282
283
{
283
284
...
284
-
},
285
+
},
285
286
]
286
287
}
287
288
],
@@ -433,7 +434,7 @@ The following code sample showcases the results of content and field extraction
433
434
"height": 960,
434
435
"markdown": "# Shot 0:0.0 => 0:1.800\n\n## Transcript\n\n```\n\nWEBVTT\n\n0:0.80 --> 0:10.560\n<v Speaker>When I was planning my trip...",
435
436
"fields": {
436
-
437
+
437
438
"description": {
438
439
"type": "string",
439
440
"valueString": "The video begins with a view from a glass floor, showing a person's feet in white sneakers standing on it. The scene captures a downward view of a structure, possibly a tower, with a grid pattern on the floor and a clear view of the ground below. The lighting is bright, suggesting a sunny day, and the colors are dominated by the orange of the structure and the gray of the floor."
@@ -482,17 +483,17 @@ The following JSON code sample shows a minimal consolidated index that support v
Once your content is extracted and indexed, integrate [Azure OpenAI's embedding and chat models](../../openai/concepts/models.md) to create an interactive question-answering system:
517
518
@@ -520,13 +521,21 @@ This approach grounds the response with your actual content, enabling the model
520
521
The combination of Content Understanding's extraction capabilities, Azure AI Search's retrieval functions, and Azure OpenAI's generation abilities creates a powerful end-to-end RAG solution that can seamlessly work with all your content types.
521
522
522
523
## Get started
524
+
523
525
Content Understanding supports the following development options:
* Learn more about [document](../document/overview.md), [image](../image/overview.md), [audio](../audio/overview.md), [video](../video/overview.md) capabilities.
531
-
* Learn more about Content Understanding [**best practices**](../concepts/best-practices.md) and [**capabilities**](../concepts/capabilities.md).
536
+
537
+
* Learn more about [document](../document/overview.md), [image](../image/overview.md), [audio](../audio/overview.md), and [video](../video/overview.md) capabilities
538
+
539
+
* Learn more about Content Understanding [**best practices**](../concepts/best-practices.md) and [**capabilities**](../concepts/capabilities.md)
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/tutorial/build-rag-solution.md
+15-12Lines changed: 15 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,12 +16,12 @@ This tutorial explains how to create a retrieval-augmented generation (RAG) solu
16
16
17
17
## Exercises included in this tutorial
18
18
19
-
1.**[Create an analyzer](#creating-an-analyzer)**. Learn how to create reusable analyzers to extract structured content from multimodal data using content extraction.
20
-
1.**[Generate targeted metadata with field extraction](#content-and-field-extraction)**. Discover how to use AI to generate further metadata, such as summaries or key topics, to enrich extracted content.
21
-
1.**[Preprocess extracted content](#preprocessing-output-from-content-understanding)**. Explore ways to transform extracted content into vector embeddings for semantic search and retrieval.
22
-
1.**[Design a unified index](#embed-and-index-extracted-content)**. Develop a unified Azure AI Search index that integrates and organizes multimodal data for efficient retrieval.
23
-
1.**[Semantic chunk retrieval](#semantic-chunk-retrieval)**. Extract contextually relevant information to deliver more precise and meaningful answers to user queries.
24
-
1.**[Interact with data using chat models](#use-openai-to-interact-with-data)** Use Azure OpenAI chat models to engage with your indexed data, enabling conversational search, querying, and answering.
19
+
***[Create an analyzer](#creating-an-analyzer)**. Learn how to create reusable analyzers to extract structured content from multimodal data using content extraction.
20
+
***[Generate targeted metadata with field extraction](#content-and-field-extraction)**. Discover how to use AI to generate further metadata, such as summaries or key topics, to enrich extracted content.
21
+
***[Preprocess extracted content](#preprocessing-output-from-content-understanding)**. Explore ways to transform extracted content into vector embeddings for semantic search and retrieval.
22
+
***[Design a unified index](#embed-and-index-extracted-content)**. Develop a unified Azure AI Search index that integrates and organizes multimodal data for efficient retrieval.
23
+
***[Semantic chunk retrieval](#semantic-chunk-retrieval)**. Extract contextually relevant information to deliver more precise and meaningful answers to user queries.
24
+
***[Interact with data using chat models](#use-openai-to-interact-with-data)** Use Azure OpenAI chat models to engage with your indexed data, enabling conversational search, querying, and answering.
25
25
26
26
## Prerequisites
27
27
@@ -46,7 +46,7 @@ To get started, you need **An active Azure subscription**. If you don't have an
46
46
47
47
## Extract data
48
48
49
-
Retrieval-augmented generation (*RAG**) is a method that enhances the functionality of Large Language Models (*LLM**) by integrating data from external knowledge sources. Building a robust multimodal RAG solution begins with extracting and structuring data from diverse content types. Azure AI Content Understanding provides three key components to facilitate this process: **content extraction**, **field extraction**, and **analyzers**. Together, these components form the foundation for creating a unified, reusable, and enhanced data pipeline for RAG workflows.
49
+
Retrieval-augmented generation (*RAG**) is a method that enhances the functionality of Large Language Models (**LLM**) by integrating data from external knowledge sources. Building a robust multimodal RAG solution begins with extracting and structuring data from diverse content types. Azure AI Content Understanding provides three key components to facilitate this process: **content extraction**, **field extraction**, and **analyzers**. Together, these components form the foundation for creating a unified, reusable, and enhanced data pipeline for RAG workflows.
50
50
51
51
## Implementation steps
52
52
@@ -58,7 +58,7 @@ To implement data extraction in Content Understanding, follow these steps:
58
58
59
59
1.**(Optional) Enhance with Field Extraction:** Optionally, specify AI-generated fields to enrich the extracted content with added metadata.
60
60
61
-
## Creating an analyzer
61
+
## Create analyzers
62
62
63
63
Analyzers are reusable components in Content Understanding that streamline the data extraction process. Once an analyzer is created, it can be used repeatedly to process files and extract content or fields based on predefined schemas. An analyzer acts as a blueprint for how data should be processed, ensuring consistency and efficiency across multiple files and content types.
0 commit comments