Skip to content

Commit 61cca43

Browse files
committed
update implementation
1 parent d0299ce commit 61cca43

File tree

2 files changed

+7
-11
lines changed

2 files changed

+7
-11
lines changed

articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -54,24 +54,24 @@ Here’s an overview of the implementation process, beginning with data extracti
5454

5555
### 1. Content Extraction: The Foundation for RAG with Content Understanding
5656

57-
Content extraction is ideal for transforming raw multimodal data into structured, searchable formats:
57+
Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
5858
- **Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
5959
- **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
6060
- **Video:** Segments video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
6161

62-
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights.
62+
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
6363

6464
### 2. Field Extraction: Enhancing Knowledge Bases for Better Retrieval
6565

66-
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision:
66+
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
6767
- **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
6868
- **Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
6969
- **Audio:** Extract key topics or sentiment analysis from conversations and to provide additional context for queries.
7070
- **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
7171

7272
By combining content extraction with field extraction, organizations can create a contextually rich knowledge base optimized for indexing, retrieval, and RAG scenarios, ensuring more accurate and meaningful responses to user queries.
7373

74-
Learn more about [content extraction](./capabilities.md#content-extraction) and [field extraction](./capabilities.md#field-extraction) capabilities.
74+
Learn more about [field extraction](./capabilities.md#field-extraction) capabilities.
7575

7676
#### Code Sample: Analyzer and Schema Configuration
7777
Below is an example of a analyzer and schema creation for various modalities in a multimodal RAG scenario.
@@ -443,15 +443,11 @@ After extracting data with Azure AI Content Understanding, the next steps focus
443443
> [View full code sample for RAG on GitHub.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
444444

445445
## 3. Create a Unified Search Index
446+
After processing multimodal content with Azure AI Content Understanding, create a comprehensive search infrastructure using your newly structured data. By embedding the markdown and JSON outputs with Azure OpenAI's embedding models and indexing them in Azure AI Search, you'll establish a unified knowledge repository spanning all content types.
446447

447-
After processing multimodal content with Azure AI Content Understanding, the next step is to create a comprehensive search infrastructure that leverages this richly structured data. By embedding the markdown and JSON outputs using Azure OpenAI's embedding models and indexing them with [Azure AI Search](https://docs.azure.cn/en-us/search/tutorial-rag-build-solution-index-schema), you can create a unified knowledge repository that seamlessly spans all content modalities.
448+
Azure AI Search offers advanced search strategies for multimodal content. In this implementation, [hybrid search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview) combines vector and full-text indexing to blend keyword precision with semantic understanding—ideal for complex queries requiring both exact matching and contextual relevance. This approach significantly enhances the quality of information fed to generation models, producing more accurate, contextually appropriate responses
448449

449-
Azure AI Search provides advanced search strategies to maximize the value of multimodal content. In this scenario, we utilize [hybrid search](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview) approach which combines both vector and full text indexing to blend keyword precision with semantic understanding - an approach particularly effective for complex queries requiring both exact matching and contextual relevance. By combining traditional keyword matching with vector embeddings, hybrid search significantly enhances the quality and relevance of information fed to the generation model, resulting in more accurate and contextually appropriate responses.
450-
451-
> [!NOTE]
452-
> For comprehensive guidance on implementing different search techniques, visit the [Azure AI Search documentation](https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview).
453-
454-
Below is a minimal consolidated index that support vector and hybrid search and enables cross-modal search capabilities, allowing users to discover relevant information regardless of the original content format:
450+
Below is a sample consolidated index that support vector and hybrid search and enables cross-modal search capabilities, allowing users to discover relevant information regardless of the original content format:
455451

456452
```json
457453
{
1.41 KB
Loading

0 commit comments

Comments
 (0)