You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md
+15-19Lines changed: 15 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 04/25/2025
12
12
13
13
# Retrieval-augmented generation with Content Understanding
14
14
15
-
Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (*LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by *LLM**s. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the *LLM** responses.
15
+
Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (**LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by an **LLM**. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the **LLM** responses.
16
16
17
17
Azure AI Content Understanding addresses these challenges by offering advanced content extraction capabilities across diverse modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration eliminates the complexities of managing separate extraction pipelines and workflows. A unified approach ensures superior data handling for documents, images, audio, and video, thus enhancing both precision and depth in information retrieval. Such innovation proves especially beneficial for **RAG** applications, where the accuracy and contextual relevance of responses depend on a deep understanding of interconnections, interrelationships, and context.
18
18
@@ -54,17 +54,24 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
54
54
55
55
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
56
56
57
-
## RAG implementation pattern
57
+
Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
58
+
-**Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
59
+
-**Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
60
+
-**Video:** Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
58
61
59
-
An overview of the **RAG** implementation pattern is as follows:
62
+
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
60
63
61
64
1.[Extract content](#content-extraction). Convert unstructured multimodal data into a structured representation.
62
65
63
-
1.[Generate embeddings](../../openai/how-to/embeddings.md). Apply embedding models to represent the structured data as vectors.
66
+
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67
+
-**Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
68
+
-**Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
69
+
-**Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
70
+
-**Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
64
71
65
72
1.[Create a unified search index](#create-a-unified-search-index). Store the embedded vectors in a database or search index for efficient retrieval.
66
73
67
-
1.[Utilize Azure OpenAI models](#utilize-azure-openai-models) Use generative AI chat models to query the retrieval systems and generate responses.
74
+
Learn more about [field extraction](./capabilities.md#field-extraction) capabilities.
68
75
69
76
### Content extraction
70
77
@@ -456,22 +463,15 @@ The following code sample showcases the results of content and field extraction
456
463
After data is extracted using Azure AI Content Understanding, the next steps involve integrating it with Azure AI Search and Azure OpenAI. This integration demonstrates the seamless synergy between data extraction, retrieval, and generative AI, creating a comprehensive and efficient solution for RAG scenarios.
457
464
458
465
> [!div class="nextstepaction"]
459
-
> [View full code sample for RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
466
+
> [View full code sample for Multimodal RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python/blob/main/notebooks/search_with_multimodal_RAG.ipynb)
460
467
461
468
### Create a unified search index
462
469
463
470
After Azure AI Content Understanding processes multimodal content, the next essential step is to develop a powerful search framework that effectively uses the enriched structured data. You can use [Azure OpenAI's embedding models](../../openai/how-to/embeddings.md) to embed markdown and JSON outputs. By indexing these embeddings with [Azure AI Search](https://docs.azure.cn/en-us/search/tutorial-rag-build-solution-index-schema), you can create an integrated knowledge repository. This repository effortlessly bridges various content modalities.
464
471
465
-
Azure AI Search provides advanced search strategies to maximize the value of multimodal content:
472
+
Azure AI Search provides advanced search strategies to maximize the value of multimodal content.
466
473
467
-
- **Hybrid Search:** Combines semantic understanding and keyword matching to retrieve information based on both conceptual similarity and explicit terminology, ideal for multimodal content with varied expressions.
468
-
- **Vector Search:** Uses embeddings to uncover subtle semantic connections across modalities, even when terminology differs.
469
-
- **Semantic Ranking:** Prioritizes results based on deeper contextual understanding rather than keyword frequency, surfacing the most relevant information regardless of format.
470
-
471
-
By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
472
-
473
-
> [!NOTE]
474
-
> For comprehensive guidance on implementing different search techniques, visit the [Azure AI Search documentation](../../../search/hybrid-search-overview.md).
474
+
In this implementation, [hybrid search](../../../search/hybrid-search-overview.md) combines vector and full-text indexing to blend keyword precision with semantic understanding—ideal for complex queries requiring both exact matching and contextual relevance. By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
475
475
476
476
The following JSON code sample shows a minimal consolidated index that support vector and hybrid search and enables cross-modal search capabilities, allowing users to discover relevant information regardless of the original content format:
477
477
@@ -515,10 +515,6 @@ The following JSON code sample shows a minimal consolidated index that support v
515
515
516
516
Once your content is extracted and indexed, integrate [Azure OpenAI's embedding and chat models](../../openai/concepts/models.md) to create an interactive question-answering system:
517
517
518
-
1. **Retrieve relevant content** from your unified index when a user submits a query
519
-
2. **Create an effective prompt** that combines the user's question with the retrieved context
520
-
3. **Generate responses** using Azure OpenAI models that reference specific content from various modalities
521
-
522
518
This approach grounds the response with your actual content, enabling the model to answer questions by referencing specific document sections, describing relevant images, quoting from video transcripts, or citing speaker statements from audio recordings.
523
519
524
520
The combination of Content Understanding's extraction capabilities, Azure AI Search's retrieval functions, and Azure OpenAI's generation abilities creates a powerful end-to-end RAG solution that can seamlessly work with all your content types.
0 commit comments