|
1 | 1 | ---
|
2 |
| -title: Azure AI Content Understanding Retrieval Augmented Generation Concept |
| 2 | +title: Azure AI Content Understanding retrieval-augmented generation Concept |
3 | 3 | titleSuffix: Azure AI services
|
4 |
| -description: Learn about Retrieval Augmented Generation |
| 4 | +description: Learn about retrieval-augmented generation |
5 | 5 | author: laujan
|
6 | 6 | ms.author: tonyeiyalla
|
7 | 7 | manager: nitinme
|
8 | 8 | ms.service: azure-ai-content-understanding
|
9 | 9 | ms.topic: overview
|
10 |
| -ms.date: 03/16/2025 |
11 |
| -ms.custom: 2025-understanding-release |
| 10 | +ms.date: 04/23/2025 |
12 | 11 | ---
|
13 |
| -# Creating a Multimodal Retrieval Augmented Generation Solution with Content Understanding |
14 | 12 |
|
15 |
| -# Introduction |
| 13 | +# Retrieval-augmented generation with Content Understanding |
16 | 14 |
|
17 |
| -Retrieval Augmented Generation (RAG) enhances Generative AI models by grounding their responses in external knowledge sources, significantly improving accuracy, relevance, and reliability. A key challenge in RAG is effectively extracting and preparing multimodal content – documents, images, audio, and video – so that it can be accurately retrieved and used to inform the LLM's responses. |
| 15 | +retrieval-augmented generation (RAG) enhances Generative AI models by grounding their responses in external knowledge sources, significantly improving accuracy, relevance, and reliability. A key challenge in RAG is effectively extracting and preparing multimodal content – documents, images, audio, and video – so that it can be accurately retrieved and used to inform the LLM's responses. |
18 | 16 |
|
19 | 17 | Azure AI Content Understanding addresses these challenges by providing sophisticated extraction capabilities across all content modalities, preserving semantic integrity and contextual relationships that traditional extraction methods often lose. This unified approach eliminates the need to manage separate workflows and models for different content types, streamlining implementation while ensuring optimal representation for retrieval and generation.
|
20 | 18 |
|
21 | 19 | ## Why Does Multimodal Data Matter for RAG?
|
22 | 20 |
|
23 |
| -In traditional content processing, simple text extraction was sufficient for many use cases. However, modern enterprise environments contain rich, diverse information spread across multiple formats—documents with complex layouts, images conveying visual insights, audio recordings of crucial conversations, and videos that combine all these elements. For truly comprehensive Retrieval Augmented Generation (RAG) systems, all of this content must be accurately processed and made available to generative AI models. This ensures that when users pose questions, the underlying RAG system can retrieve relevant information regardless of its original format—whether it's a complex table in a financial report, a technical diagram in a manual, insights from a recorded conference call, or explanations from a training video. |
| 21 | +In traditional content processing, simple text extraction was sufficient for many use cases. However, modern enterprise environments contain rich, diverse information spread across multiple formats—documents with complex layouts, images conveying visual insights, audio recordings of crucial conversations, and videos that combine all these elements. For truly comprehensive retrieval-augmented generation (RAG) systems, all of this content must be accurately processed and made available to generative AI models. This ensures that when users pose questions, the underlying RAG system can retrieve relevant information regardless of its original format—whether it's a complex table in a financial report, a technical diagram in a manual, insights from a recorded conference call, or explanations from a training video. |
24 | 22 |
|
25 | 23 | ## Capabilities of Content Understanding for Multimodal RAG
|
26 | 24 |
|
@@ -50,7 +48,7 @@ A high level summary of RAG implementation pattern looks like this:
|
50 | 48 | 3. Store embedded vectors in database or search index.
|
51 | 49 | 4. Use Generative AI chat models to query and generate responses from retrieval systems.
|
52 | 50 |
|
53 |
| -Here’s an overview of the implementation process, beginning with data extraction using Azure AI Content Understanding as the foundation for transforming raw multimodal data into structured, searchable formats optimized for RAG workflows: |
| 51 | +Here's an overview of the implementation process, beginning with data extraction using Azure AI Content Understanding as the foundation for transforming raw multimodal data into structured, searchable formats optimized for RAG workflows: |
54 | 52 |
|
55 | 53 | ### 1. Content Extraction: The Foundation for RAG with Content Understanding
|
56 | 54 |
|
@@ -337,16 +335,16 @@ Below is an example showcasing the results of content and field extraction using
|
337 | 335 | "valueString": "Maria Smith contacted Contoso to inquire about her current point balance. Agent John Doe confirmed her identity and informed her that she has 599 points. Maria did not require any further information and the call ended on a positive note."
|
338 | 336 | },
|
339 | 337 | "TrainingTopics": {
|
340 |
| - "type": "array", |
341 |
| - "valueArray": [ |
342 |
| - { |
343 |
| - "type": "string", |
344 |
| - "valueString": "Compliance" |
345 |
| - }, |
346 |
| - { |
347 |
| - "type": "string", |
348 |
| - "valueString": "Risk mitigation" |
349 |
| - },] |
| 338 | + "type": "array", |
| 339 | + "valueArray": [ |
| 340 | + { |
| 341 | + "type": "string", |
| 342 | + "valueString": "Compliance" |
| 343 | + }, |
| 344 | + { |
| 345 | + "type": "string", |
| 346 | + "valueString": "Risk mitigation" |
| 347 | + },] |
350 | 348 | },
|
351 | 349 | "People": {
|
352 | 350 | "type": "array",
|
@@ -416,16 +414,16 @@ Below is an example showcasing the results of content and field extraction using
|
416 | 414 | "valueString": "The video begins with a view from a glass floor, showing a person's feet in white sneakers standing on it. The scene captures a downward view of a structure, possibly a tower, with a grid pattern on the floor and a clear view of the ground below. The lighting is bright, suggesting a sunny day, and the colors are dominated by the orange of the structure and the gray of the floor."
|
417 | 415 | },
|
418 | 416 | "KeyTopics": {
|
419 |
| - "type": "array", |
420 |
| - "valueArray": [ |
421 |
| - { |
422 |
| - "type": "string", |
423 |
| - "valueString": "Flight delay" |
424 |
| - }, |
425 |
| - { |
426 |
| - "type": "string", |
427 |
| - "valueString": "Customer service" |
428 |
| - }, |
| 417 | + "type": "array", |
| 418 | + "valueArray": [ |
| 419 | + { |
| 420 | + "type": "string", |
| 421 | + "valueString": "Flight delay" |
| 422 | + }, |
| 423 | + { |
| 424 | + "type": "string", |
| 425 | + "valueString": "Customer service" |
| 426 | + }, |
429 | 427 | ]
|
430 | 428 | }
|
431 | 429 | },
|
|
0 commit comments