find changes

laujan · laujan · commit efb08e5dd4e5 · 2025-04-29T08:58:02.000-07:00
diff --git a/articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md b/articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md
@@ -1,5 +1,5 @@
 ---
-title: Azure AI Content Understanding retrieval-augmented generation 
+title: Azure AI Content Understanding retrieval-augmented generation
 titleSuffix: Azure AI services
 description: Learn about using Content Understanding and retrieval-augmented generation
 author: laujan
@@ -12,7 +12,7 @@ ms.custom: 2025-understanding-release
 ---
 # Multimodal retrieval-augmented generation with Content Understanding
 
-Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (**LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by an **LLM**. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the **LLM** responses. 
+Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (**LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by an **LLM**. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the **LLM** responses.
 
 Azure AI Content Understanding addresses these challenges by providing sophisticated extraction capabilities across all content modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration preserves semantic integrity and contextual relationships that traditional extraction methods often lose. A unified approach eliminates the need to manage separate workflows and models for different content types, streamlining implementation while ensuring optimal representation for retrieval and generation.
 
@@ -34,28 +34,19 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
 
 ## Build multimodal RAG Solution with Content Understanding
 
-Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
-
-**Document**: Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
-**Image**: Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
-**Audio**: Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
-**Video**: Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
-
-:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding **RAG** architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
-
-### **RAG** Scenario: Corporate Training Knowledge Base
-
 Imagine a corporate training program with a collection of documents, images, audio recordings, and videos covering topics such as compliance, safety, and technical skills. The goal is to create a system that retrieves relevant information from these multimodal sources based on user queries, enabling employees to access precise and contextually rich answers.
 
-## RAG implementation
+### Implementation
 
 A high level summary of **RAG** implementation pattern looks like this:
 
 1. Transform unstructured multimodal data into structured representation using Content Understanding.
 1. Embed structured output using embedding models.
-1. Store embedded vectors in database or search index.  
+1. Store embedded vectors in database or search index.
 1. Use generative AI chat models to query and generate responses from retrieval systems.
 
+:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding **RAG** architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
+
 Here's an overview of the implementation process, beginning with data extraction using Azure AI Content Understanding as the foundation for transforming raw multimodal data into structured, searchable formats optimized for **RAG** workflows:
 
 * [Content extraction](#content-extraction-rag-with-content-understanding-foundation)
@@ -69,7 +60,7 @@ Content extraction forms the foundation of effective **RAG** systems by transfor
 
 * **Document:** Extract hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
 * **Image:** Transform visual data into searchable text by verbalizing diagrams and charts, extracting embedded text, and converting graphical data into structured formats. Technical illustrations are analyzed to identify components and relationships.
-* **Audio:** Generate speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages. 
+* **Audio:** Generate speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
 * **Video:** Video data is segmented into meaningful units, transcribe spoken content, and provide scene descriptions while addressing context window limitations in generative AI models.
 
 While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](capabilities.md)
@@ -81,15 +72,15 @@ Field extraction complements content extraction by generating targeted metadata
 * **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
 * **Image:** Convert visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
 * **Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
-* **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage. 
+* **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
 
-Combining content extraction with field extraction enables organizations to create a contextually rich knowledge base optimized for indexing, retrieval, and **RAG** scenarios, ensuring more accurate and meaningful responses to user queries. 
+Combining content extraction with field extraction enables organizations to create a contextually rich knowledge base optimized for indexing, retrieval, and **RAG** scenarios, ensuring more accurate and meaningful responses to user queries.
 
 Learn more about [field extraction](capabilities.md#field-extraction).
 
 #### Analyzer and schema configuration
- 
-The following code sample is an example of an analyzer and schema creation for various modalities in a multimodal **RAG** scenario. 
+
+The following code sample is an example of an analyzer and schema creation for various modalities in a multimodal **RAG** scenario.
 
 ---
 
@@ -269,12 +260,12 @@ The following code sample showcases the results of content and field extraction
             "words": [
               {
                ....
-              }, 
+              },
             ],
             "lines": [
               {
                 ...
-              }, 
+              },
             ]
           }
         ],
@@ -426,7 +417,7 @@ The following code sample showcases the results of content and field extraction
         "height": 960,
         "markdown": "# Shot 0:0.0 => 0:1.800\n\n## Transcript\n\n```\n\nWEBVTT\n\n0:0.80 --> 0:10.560\n<v Speaker>When I was planning my trip...",
         "fields": {
-          
+
           "description": {
             "type": "string",
             "valueString": "The video begins with a view from a glass floor, showing a person's feet in white sneakers standing on it. The scene captures a downward view of a structure, possibly a tower, with a grid pattern on the floor and a clear view of the ground below. The lighting is bright, suggesting a sunny day, and the colors are dominated by the orange of the structure and the gray of the floor."
@@ -473,17 +464,17 @@ To follow is a sample consolidated index that support vector and hybrid search a
     # Document content fields
     {"name": "document_content", "type": "Edm.String", "searchable": true, "retrievable": true},
     {"name": "document_headers", "type": "Edm.String", "searchable": true, "retrievable": true},
-    
+
     # Image-derived content
-    {"name": "visual_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true}, 
-    { "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },   
+    {"name": "visual_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true},
+    { "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },
 
     # Video content components
     {"name": "video_transcript", "type": "Edm.String", "searchable": true, "retrievable": true},
     {"name": "scene_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true},
     {"name": "video_topics", "type": "Edm.String", "searchable": true, "retrievable": true},
     { "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },
-    
+
     # Audio processing results
     {"name": "audio_transcript", "type": "Edm.String", "searchable": true, "retrievable": true},
     {"name": "speaker_attribution", "type": "Edm.String", "searchable": true, "retrievable": true},
@@ -494,7 +485,7 @@ To follow is a sample consolidated index that support vector and hybrid search a
       "algorithms": [
           { "name": "my-algo-config", "kind": "hnsw", "hnswParameters": { }  }
       ],
-      "profiles": [ 
+      "profiles": [
         { "name": "my-vector-profile", "algorithm": "my-algo-config" }
       ]
   }
@@ -513,7 +504,7 @@ The combination of Content Understanding's extraction capabilities, Azure AI Sea
 ## Get started
 Content Understanding supports the following development options:
 * [REST API](../quickstart/use-rest-api.md) Quickstart.
-* [Azure Foundry](../quickstart//use-ai-foundry.md) Portal Quickstart. 
+* [Azure Foundry](../quickstart//use-ai-foundry.md) Portal Quickstart.
 
 ## Next steps