Skip to content

Commit 74f17ea

Browse files
committed
add incoming changes
2 parents dd39498 + 964fb00 commit 74f17ea

File tree

2 files changed

+15
-19
lines changed

2 files changed

+15
-19
lines changed

articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 04/25/2025
1212

1313
# Retrieval-augmented generation with Content Understanding
1414

15-
Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (*LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by *LLM**s. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the *LLM** responses.
15+
Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilities of Large Language Models (**LLM**) by integrating data from external knowledge sources. Integrating diverse and current information refines the precision and contextual relevance of the outputs generated by an **LLM**. A key challenge for **RAG** is the efficient extraction and processing of multimodal content—such as documents, images, audio, and video—to ensure accurate retrieval and effective use to bolster the **LLM** responses.
1616

1717
Azure AI Content Understanding addresses these challenges by offering advanced content extraction capabilities across diverse modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration eliminates the complexities of managing separate extraction pipelines and workflows. A unified approach ensures superior data handling for documents, images, audio, and video, thus enhancing both precision and depth in information retrieval. Such innovation proves especially beneficial for **RAG** applications, where the accuracy and contextual relevance of responses depend on a deep understanding of interconnections, interrelationships, and context.
1818

@@ -54,17 +54,24 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
5454

5555
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
5656

57-
## RAG implementation pattern
57+
Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
58+
- **Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
59+
- **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
60+
- **Video:** Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
5861

59-
An overview of the **RAG** implementation pattern is as follows:
62+
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
6063

6164
1. [Extract content](#content-extraction). Convert unstructured multimodal data into a structured representation.
6265

63-
1. [Generate embeddings](../../openai/how-to/embeddings.md). Apply embedding models to represent the structured data as vectors.
66+
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67+
- **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
68+
- **Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
69+
- **Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
70+
- **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
6471

6572
1. [Create a unified search index](#create-a-unified-search-index). Store the embedded vectors in a database or search index for efficient retrieval.
6673

67-
1. [Utilize Azure OpenAI models](#utilize-azure-openai-models) Use generative AI chat models to query the retrieval systems and generate responses.
74+
Learn more about [field extraction](./capabilities.md#field-extraction) capabilities.
6875

6976
### Content extraction
7077

@@ -456,22 +463,15 @@ The following code sample showcases the results of content and field extraction
456463
After data is extracted using Azure AI Content Understanding, the next steps involve integrating it with Azure AI Search and Azure OpenAI. This integration demonstrates the seamless synergy between data extraction, retrieval, and generative AI, creating a comprehensive and efficient solution for RAG scenarios.
457464

458465
> [!div class="nextstepaction"]
459-
> [View full code sample for RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
466+
> [View full code sample for Multimodal RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python/blob/main/notebooks/search_with_multimodal_RAG.ipynb)
460467

461468
### Create a unified search index
462469

463470
After Azure AI Content Understanding processes multimodal content, the next essential step is to develop a powerful search framework that effectively uses the enriched structured data. You can use [Azure OpenAI's embedding models](../../openai/how-to/embeddings.md) to embed markdown and JSON outputs. By indexing these embeddings with [Azure AI Search](https://docs.azure.cn/en-us/search/tutorial-rag-build-solution-index-schema), you can create an integrated knowledge repository. This repository effortlessly bridges various content modalities.
464471

465-
Azure AI Search provides advanced search strategies to maximize the value of multimodal content:
472+
Azure AI Search provides advanced search strategies to maximize the value of multimodal content.
466473

467-
- **Hybrid Search:** Combines semantic understanding and keyword matching to retrieve information based on both conceptual similarity and explicit terminology, ideal for multimodal content with varied expressions.
468-
- **Vector Search:** Uses embeddings to uncover subtle semantic connections across modalities, even when terminology differs.
469-
- **Semantic Ranking:** Prioritizes results based on deeper contextual understanding rather than keyword frequency, surfacing the most relevant information regardless of format.
470-
471-
By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
472-
473-
> [!NOTE]
474-
> For comprehensive guidance on implementing different search techniques, visit the [Azure AI Search documentation](../../../search/hybrid-search-overview.md).
474+
In this implementation, [hybrid search](../../../search/hybrid-search-overview.md) combines vector and full-text indexing to blend keyword precision with semantic understanding—ideal for complex queries requiring both exact matching and contextual relevance. By carefully selecting and configuring these search techniques based on your specific use case requirements, you can ensure that your RAG system retrieves the most relevant content across all modalities, significantly enhancing the quality and accuracy of generated responses.
475475

476476
The following JSON code sample shows a minimal consolidated index that support vector and hybrid search and enables cross-modal search capabilities, allowing users to discover relevant information regardless of the original content format:
477477

@@ -515,10 +515,6 @@ The following JSON code sample shows a minimal consolidated index that support v
515515

516516
Once your content is extracted and indexed, integrate [Azure OpenAI's embedding and chat models](../../openai/concepts/models.md) to create an interactive question-answering system:
517517

518-
1. **Retrieve relevant content** from your unified index when a user submits a query
519-
2. **Create an effective prompt** that combines the user's question with the retrieved context
520-
3. **Generate responses** using Azure OpenAI models that reference specific content from various modalities
521-
522518
This approach grounds the response with your actual content, enabling the model to answer questions by referencing specific document sections, describing relevant images, quoting from video transcripts, or citing speaker statements from audio recordings.
523519

524520
The combination of Content Understanding's extraction capabilities, Azure AI Search's retrieval functions, and Azure OpenAI's generation abilities creates a powerful end-to-end RAG solution that can seamlessly work with all your content types.
-41.8 KB
Loading

0 commit comments

Comments
 (0)