MicrosoftDocs
diff --git a/‎articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md‎
Lines changed: 15 additions & 19 deletions b/‎articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md‎
Lines changed: 15 additions & 19 deletions
diff --git a/‎articles/ai-services/content-understanding/media/concepts/rag-architecture-2.png‎
-41.8 KB b/‎articles/ai-services/content-understanding/media/concepts/rag-architecture-2.png‎
-41.8 KB
@@ -12,7 +12,7 @@ ms.date: 04/25/2025
 
 # Retrieval-augmented generation with Content Understanding
 
-Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (*LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by *LLM**s. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the *LLM** responses.
+Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (**LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by an **LLM**. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the **LLM** responses.
 
 Azure AI Content Understanding addresses these challenges by offering advanced content extraction capabilities across diverse modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration eliminates the complexities of managing separate extraction pipelines and workflows. A unified approach ensures superior data handling for documents, images, audio, and video, thus enhancing both precision and depth in information retrieval. Such innovation proves especially beneficial for **RAG** applications, where the accuracy and contextual relevance of responses depend on a deep understanding of interconnections, interrelationships, and context.
 
@@ -54,17 +54,24 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
 
 :::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
 
-## RAG implementation pattern
+Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
+- **Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
+- **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages. 
+- **Video:** Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
 
-An overview of the **RAG** implementation pattern is as follows:
+While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
 
 1. [Extract content](#content-extraction). Convert unstructured multimodal data into a structured representation.
 
-1. [Generate embeddings](../../openai/how-to/embeddings.md). Apply embedding models to represent the structured data as vectors.
+Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
+- **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
+- **Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
+- **Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
+- **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage. 
 
 1. [Create a unified search index](#create-a-unified-search-index). Store the embedded vectors in a database or search index for efficient retrieval.
 
-1. [Utilize Azure OpenAI models](#utilize-azure-openai-models) Use generative AI chat models to query the retrieval systems and generate responses.
+Learn more about [field extraction](./capabilities.md#field-extraction) capabilities.
 
 ###  Content extraction
 
@@ -456,22 +463,15 @@ The following code sample showcases the results of content and field extraction
 After data is extracted using Azure AI Content Understanding, the next steps involve integrating it with Azure AI Search and Azure OpenAI. This integration demonstrates the seamless synergy between data extraction, retrieval, and generative AI, creating a comprehensive and efficient solution for RAG scenarios.
 
 > [!div class="nextstepaction"]
-> [View full code sample for RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
+> [View full code sample for Multimodal RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python/blob/main/notebooks/search_with_multimodal_RAG.ipynb)
 
 ### Create a unified search index
 
 After Azure AI Content Understanding processes multimodal content, the next essential step is to develop a powerful search framework that effectively uses the enriched structured data. You can use [Azure OpenAI's embedding models](../../openai/how-to/embeddings.md) to embed markdown and JSON outputs. By indexing these embeddings with [Azure AI Search](https://docs.azure.cn/en-us/search/tutorial-rag-build-solution-index-schema), you can create an integrated knowledge repository. This repository effortlessly bridges various content modalities.
 
-Azure AI Search provides advanced search strategies to maximize the value of multimodal content:
+Azure AI Search provides advanced search strategies to maximize the value of multimodal content.
 
-- **Hybrid Search:** Combines semantic understanding and keyword matching to retrieve information based on both conceptual similarity and explicit terminology, ideal for multimodal content with varied expressions.
-- **Vector Search:** Uses embeddings to uncover subtle semantic connections across modalities, even when terminology differs.
-- **Semantic Ranking:** Prioritizes results based on deeper contextual understanding rather than keyword frequency, surfacing the most relevant information regardless of format.
-
-By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
-
-> [!NOTE]
-> For comprehensive guidance on implementing different search techniques, visit the [Azure AI Search documentation](../../../search/hybrid-search-overview.md).
+In this implementation, [hybrid search](../../../search/hybrid-search-overview.md) combines vector and full-text indexing to blend keyword precision with semantic understanding—ideal for complex queries requiring both exact matching and contextual relevance. By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
 
 The following JSON code sample shows a minimal consolidated index that support vector and hybrid search and enables cross-modal search capabilities, allowing users to discover relevant information regardless of the original content format:
 
@@ -515,10 +515,6 @@ The following JSON code sample shows a minimal consolidated index that support v
 
 Once your content is extracted and indexed, integrate [Azure OpenAI's embedding and chat models](../../openai/concepts/models.md) to create an interactive question-answering system:
 
-1. **Retrieve relevant content** from your unified index when a user submits a query
-2. **Create an effective prompt** that combines the user's question with the retrieved context
-3. **Generate responses** using Azure OpenAI models that reference specific content from various modalities
-
 This approach grounds the response with your actual content, enabling the model to answer questions by referencing specific document sections, describing relevant images, quoting from video transcripts, or citing speaker statements from audio recordings.
 
 The combination of Content Understanding's extraction capabilities, Azure AI Search's retrieval functions, and Azure OpenAI's generation abilities creates a powerful end-to-end RAG solution that can seamlessly work with all your content types.