Skip to content

Commit 36de402

Browse files
committed
resolve PR feedback
1 parent 74f17ea commit 36de402

File tree

4 files changed

+50
-38
lines changed

4 files changed

+50
-38
lines changed

articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,14 @@ Retrieval-augmented Generation (**RAG**) is a method that enhances the capabilit
1616

1717
Azure AI Content Understanding addresses these challenges by offering advanced content extraction capabilities across diverse modalities. The service seamlessly integrates advanced natural language processing, computer vision, and speech recognition into a unified framework. This integration eliminates the complexities of managing separate extraction pipelines and workflows. A unified approach ensures superior data handling for documents, images, audio, and video, thus enhancing both precision and depth in information retrieval. Such innovation proves especially beneficial for **RAG** applications, where the accuracy and contextual relevance of responses depend on a deep understanding of interconnections, interrelationships, and context.
1818

19-
:::image type="content" source="../media/concepts/rag-architecture-1.png" alt-text="screenshot of Azure Content Understanding service architecture.":::
19+
:::image type="content" source="../media/concepts/rag-architecture-1.png" alt-text="screenshot of Azure Content Understanding service architecture." lightbox="../media/concepts/rag-architecture-1.png" :::
2020

2121
## Multimodal data and RAG
2222

2323
In traditional content processing, simple text extraction sufficed for many content processing use cases. Modern enterprise environments encompass a vast array of complex information across diverse formats:
2424

2525
* **Documents** featuring intricate layouts.
26-
* **Images** rich with visual details and insights.
26+
* **Images** rich with visual details and insights.
2727
* **Audio** recordings capturing pivotal conversations.
2828
* **Videos** that seamlessly integrate and unify multiple data types.
2929

@@ -52,22 +52,23 @@ Azure AI Content Understanding addresses the core challenges of multimodal **RAG
5252

5353
* **Optimized query performance:** Content Understanding mitigates modality bias and context fragmentation by providing structured, enriched data that supports advanced relevance ranking across modalities. This approach ensures that user queries yield the most relevant information, enhancing the coherence and precision of generated responses.
5454

55-
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI.":::
55+
:::image type="content" source="../media/concepts/rag-architecture-2.png" alt-text="Screenshot of Content Understanding RAG architecture overview, process, and workflow with Azure AI Search and Azure OpenAI." lightbox="../media/concepts/rag-architecture-2.png" :::
5656

5757
Content extraction forms the foundation of effective RAG systems by transforming raw multimodal data into structured, searchable formats optimized for retrieval. The implementation varies by content type:
5858
- **Document:** Extracts hierarchical structures, such as headers, paragraphs, tables, and page elements, preserving the logical organization of training materials.
59-
- **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
59+
- **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content while automatically detecting and processing multiple languages.
6060
- **Video:** Divides video into meaningful units, transcribes spoken content, and provides scene descriptions while addressing context window limitations in generative AI models.
6161

6262
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address domain-specific needs or provide deeper contextual insights. Learn more about [content extraction](./capabilities.md#content-extraction) capabilities.
6363

6464
1. [Extract content](#content-extraction). Convert unstructured multimodal data into a structured representation.
6565

66-
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67-
- **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
68-
- **Image:** Converts visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
69-
- **Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
70-
- **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
66+
Field extraction complements content extraction by generating targeted metadata that enriches the knowledge base and improves retrieval precision. The implementation varies by content type:
67+
68+
* **Document:** Extract key topics/fields to provide concise overviews of lengthy materials.
69+
* **Image:** Convert visual information into searchable text by verbalizing diagrams, extracting embedded text, and identifying graphical components.
70+
* **Audio:** Extract key topics or sentiment analysis from conversations and to provide added context for queries.
71+
* **Video:** Generate scene-level summaries, identify key topics, or analyze brand presence and product associations within video footage.
7172

7273
1. [Create a unified search index](#create-a-unified-search-index). Store the embedded vectors in a database or search index for efficient retrieval.
7374

@@ -82,7 +83,7 @@ The RAG implementation process starts with data extraction using Azure AI Conten
8283
* **Audio:** Generates speaker-aware transcriptions that accurately capture spoken content across multiple languages through automatic detection and processing.
8384
* **Video:** Segments video content into meaningful units using scene detection and key frame extraction. It creates descriptive summaries, transcribes spoken dialogue, identifies key topics, and analyzes sentiment indicators throughout the footage. Scene descriptions are provided while addressing context limitations inherent to generative AI models.
8485

85-
#### Field Extraction
86+
#### Field extraction
8687

8788
While content extraction provides a strong foundation for indexing and retrieval, it may not fully address specialized domain-specific requirements or deliver deeper contextual insights. Field extraction is a valuable complement to content extraction by producing targeted metadata that enriches the knowledge base and improves retrieval accuracy:
8889

@@ -94,9 +95,9 @@ While content extraction provides a strong foundation for indexing and retrieval
9495

9596
Integrating content extraction with field extraction enables organizations to develop a knowledge base that is context-rich and optimized for indexing, retrieval, and RAG scenarios. This approach enables more precise and relevant responses to user inquiries. To learn more, *see* [content extraction](./capabilities.md#content-extraction) and [field extraction](./capabilities.md#field-extraction) capabilities.
9697

97-
#### Code Sample: Analyzer and Schema Configuration
98+
#### Code sample: analyzer and schema configuration
9899

99-
The following code samples show an analyzer and schema creation for various modalities in a multimodal RAG scenario.
100+
The following code samples show an analyzer and schema creation for various modalities in a multimodal RAG scenario.
100101

101102
---
102103

@@ -219,7 +220,7 @@ The following code samples show an analyzer and schema creation for various moda
219220

220221
---
221222

222-
#### Code Sample: Extraction Response
223+
#### Code sample: extraction response
223224

224225
The following code sample showcases the results of content and field extraction using Azure AI Content Understanding. These results demonstrate how multimodal data is transformed into structured, enriched formats, ready for indexing and retrieval in RAG workflows.
225226

@@ -276,12 +277,12 @@ The following code sample showcases the results of content and field extraction
276277
"words": [
277278
{
278279
....
279-
},
280+
},
280281
],
281282
"lines": [
282283
{
283284
...
284-
},
285+
},
285286
]
286287
}
287288
],
@@ -433,7 +434,7 @@ The following code sample showcases the results of content and field extraction
433434
"height": 960,
434435
"markdown": "# Shot 0:0.0 => 0:1.800\n\n## Transcript\n\n```\n\nWEBVTT\n\n0:0.80 --> 0:10.560\n<v Speaker>When I was planning my trip...",
435436
"fields": {
436-
437+
437438
"description": {
438439
"type": "string",
439440
"valueString": "The video begins with a view from a glass floor, showing a person's feet in white sneakers standing on it. The scene captures a downward view of a structure, possibly a tower, with a grid pattern on the floor and a clear view of the ground below. The lighting is bright, suggesting a sunny day, and the colors are dominated by the orange of the structure and the gray of the floor."
@@ -482,17 +483,17 @@ The following JSON code sample shows a minimal consolidated index that support v
482483
# Document content fields
483484
{"name": "document_content", "type": "Edm.String", "searchable": true, "retrievable": true},
484485
{"name": "document_headers", "type": "Edm.String", "searchable": true, "retrievable": true},
485-
486+
486487
# Image-derived content
487-
{"name": "visual_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true},
488-
{ "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },
488+
{"name": "visual_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true},
489+
{ "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },
489490

490491
# Video content components
491492
{"name": "video_transcript", "type": "Edm.String", "searchable": true, "retrievable": true},
492493
{"name": "scene_descriptions", "type": "Edm.String", "searchable": true, "retrievable": true},
493494
{"name": "video_topics", "type": "Edm.String", "searchable": true, "retrievable": true},
494495
{ "name": "chunked_content_vectorized", "type": "Edm.Single", "dimensions": 1536, "vectorSearchProfile": "my-vector-profile", "searchable": true, "retrievable": false, "stored": false },
495-
496+
496497
# Audio processing results
497498
{"name": "audio_transcript", "type": "Edm.String", "searchable": true, "retrievable": true},
498499
{"name": "speaker_attribution", "type": "Edm.String", "searchable": true, "retrievable": true},
@@ -503,15 +504,15 @@ The following JSON code sample shows a minimal consolidated index that support v
503504
"algorithms": [
504505
{ "name": "my-algo-config", "kind": "hnsw", "hnswParameters": { } }
505506
],
506-
"profiles": [
507+
"profiles": [
507508
{ "name": "my-vector-profile", "algorithm": "my-algo-config" }
508509
]
509510
}
510511
}
511512
```
512513
---
513514

514-
### Utilize Azure OpenAI Models
515+
### Utilize Azure OpenAI models
515516

516517
Once your content is extracted and indexed, integrate [Azure OpenAI's embedding and chat models](../../openai/concepts/models.md) to create an interactive question-answering system:
517518

@@ -520,13 +521,21 @@ This approach grounds the response with your actual content, enabling the model
520521
The combination of Content Understanding's extraction capabilities, Azure AI Search's retrieval functions, and Azure OpenAI's generation abilities creates a powerful end-to-end RAG solution that can seamlessly work with all your content types.
521522

522523
## Get started
524+
523525
Content Understanding supports the following development options:
526+
524527
* [REST API](../quickstart/use-rest-api.md) Quickstart.
525-
* [Azure Foundry](../quickstart//use-ai-foundry.md) Portal Quickstart.
528+
529+
* [Azure Foundry](../quickstart//use-ai-foundry.md) Portal Quickstart.
526530

527531
## Next steps
528-
* Try our RAG [code samples.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
532+
533+
* Try our [RAG code samples.](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
534+
529535
* Follow our [RAG Tutorial](../tutorial/build-rag-solution.md)
530-
* Learn more about [document](../document/overview.md), [image](../image/overview.md), [audio](../audio/overview.md), [video](../video/overview.md) capabilities.
531-
* Learn more about Content Understanding [**best practices**](../concepts/best-practices.md) and [**capabilities**](../concepts/capabilities.md).
536+
537+
* Learn more about [document](../document/overview.md), [image](../image/overview.md), [audio](../audio/overview.md), and [video](../video/overview.md) capabilities
538+
539+
* Learn more about Content Understanding [**best practices**](../concepts/best-practices.md) and [**capabilities**](../concepts/capabilities.md)
540+
532541
* Review Content Understanding [**code samples**](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python?tab=readme-ov-file#azure-ai-search-with-content-understanding-samples-python)
Binary file not shown.
Binary file not shown.

articles/ai-services/content-understanding/tutorial/build-rag-solution.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ This tutorial explains how to create a retrieval-augmented generation (RAG) solu
1616

1717
## Exercises included in this tutorial
1818

19-
1. **[Create an analyzer](#creating-an-analyzer)**. Learn how to create reusable analyzers to extract structured content from multimodal data using content extraction.
20-
1. **[Generate targeted metadata with field extraction](#content-and-field-extraction)**. Discover how to use AI to generate further metadata, such as summaries or key topics, to enrich extracted content.
21-
1. **[Preprocess extracted content](#preprocessing-output-from-content-understanding)**. Explore ways to transform extracted content into vector embeddings for semantic search and retrieval.
22-
1. **[Design a unified index](#embed-and-index-extracted-content)**. Develop a unified Azure AI Search index that integrates and organizes multimodal data for efficient retrieval.
23-
1. **[Semantic chunk retrieval](#semantic-chunk-retrieval)**. Extract contextually relevant information to deliver more precise and meaningful answers to user queries.
24-
1. **[Interact with data using chat models](#use-openai-to-interact-with-data)** Use Azure OpenAI chat models to engage with your indexed data, enabling conversational search, querying, and answering.
19+
* **[Create an analyzer](#creating-an-analyzer)**. Learn how to create reusable analyzers to extract structured content from multimodal data using content extraction.
20+
* **[Generate targeted metadata with field extraction](#content-and-field-extraction)**. Discover how to use AI to generate further metadata, such as summaries or key topics, to enrich extracted content.
21+
* **[Preprocess extracted content](#preprocessing-output-from-content-understanding)**. Explore ways to transform extracted content into vector embeddings for semantic search and retrieval.
22+
* **[Design a unified index](#embed-and-index-extracted-content)**. Develop a unified Azure AI Search index that integrates and organizes multimodal data for efficient retrieval.
23+
* **[Semantic chunk retrieval](#semantic-chunk-retrieval)**. Extract contextually relevant information to deliver more precise and meaningful answers to user queries.
24+
* **[Interact with data using chat models](#use-openai-to-interact-with-data)** Use Azure OpenAI chat models to engage with your indexed data, enabling conversational search, querying, and answering.
2525

2626
## Prerequisites
2727

@@ -46,7 +46,7 @@ To get started, you need **An active Azure subscription**. If you don't have an
4646

4747
## Extract data
4848

49-
Retrieval-augmented generation (*RAG**) is a method that enhances the functionality of Large Language Models (*LLM**) by integrating data from external knowledge sources. Building a robust multimodal RAG solution begins with extracting and structuring data from diverse content types. Azure AI Content Understanding provides three key components to facilitate this process: **content extraction**, **field extraction**, and **analyzers**. Together, these components form the foundation for creating a unified, reusable, and enhanced data pipeline for RAG workflows.
49+
Retrieval-augmented generation (*RAG**) is a method that enhances the functionality of Large Language Models (**LLM**) by integrating data from external knowledge sources. Building a robust multimodal RAG solution begins with extracting and structuring data from diverse content types. Azure AI Content Understanding provides three key components to facilitate this process: **content extraction**, **field extraction**, and **analyzers**. Together, these components form the foundation for creating a unified, reusable, and enhanced data pipeline for RAG workflows.
5050

5151
## Implementation steps
5252

@@ -58,7 +58,7 @@ To implement data extraction in Content Understanding, follow these steps:
5858

5959
1. **(Optional) Enhance with Field Extraction:** Optionally, specify AI-generated fields to enrich the extracted content with added metadata.
6060

61-
## Creating an analyzer
61+
## Create analyzers
6262

6363
Analyzers are reusable components in Content Understanding that streamline the data extraction process. Once an analyzer is created, it can be used repeatedly to process files and extract content or fields based on predefined schemas. An analyzer acts as a blueprint for how data should be processed, ensuring consistency and efficiency across multiple files and content types.
6464

@@ -113,7 +113,7 @@ sys.path.append(str(parent_dir))
113113
```
114114
---
115115

116-
#### Create analyzers
116+
#### Code sample: create analyzer
117117

118118
``` python
119119
from pathlib import Path
@@ -788,6 +788,9 @@ while True:
788788

789789

790790
## Next steps
791-
- [Explore our RAG Python code samples](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
792-
- [Try a multimodal content solution accelerator](https://github.com/microsoft/content-processing-solution-accelerator)
793-
- [Learn more about the capabilities of Content Understanding]()
791+
792+
* [Explore our RAG Python code samples](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python#samples)
793+
794+
* [Try a multimodal content solution accelerator](https://github.com/microsoft/content-processing-solution-accelerator)
795+
796+
* [Learn more Content Understanding capabilities](../concepts/capabilities.md)

0 commit comments

Comments
 (0)