Skip to content

Commit 533030a

Browse files
committed
tonye 2746
1 parent c25143b commit 533030a

File tree

7 files changed

+25
-22
lines changed

7 files changed

+25
-22
lines changed

articles/ai-services/content-understanding/capabilities/overview.md renamed to articles/ai-services/content-understanding/concepts/capabilities-overview.md

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: lajanuar
77
manager: nitinme
88
ms.service: azure-ai-content-understanding
99
ms.topic: overview
10-
ms.date: 02/03/2025
10+
ms.date: 02/25/2025
1111
ms.custom: 2025-understanding-release
1212
---
1313

@@ -19,7 +19,7 @@ ms.custom: 2025-understanding-release
1919
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
2020
> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
2121
22-
Content Understanding offers a streamlined process and various capabilities to reason over large amounts of unstructured data, accelerating time-to-value by generating an output that can be integrated into analytical workflows and retrieval augmented generation (RAG) applications.
22+
Content Understanding provides an advanced approach to processing and interpreting vast amounts of unstructured data. It offers various capabilities that accelerate time-to-value, reducing the time required to derive meaningful insights. By generating outputs that seamlessly integrate into analytical workflows and Retrieval-Augmented Generation (RAG) applications, it enhances data-driven decision-making and boosts overall productivity.
2323

2424
## Overview of Key Capabilities in Content Understanding
2525

@@ -33,73 +33,75 @@ The service employs a customizable dual-pipeline architecture that combines [con
3333

3434
Content extraction in Content Understanding is a powerful feature that transforms unstructured data into structured data, powering advanced AI processing capabilities. The structured data enables efficient downstream processing while maintaining contextual relationships in the source content.
3535

36-
Content extraction provides foundational data that grounds the generative capabilities of Field Extraction, offering essential context about the input content. Users will find content extraction invaluable for converting diverse data formats into a structured format, this capability excels in scenarios requiring:
37-
* Document digitization, indexing and retrieval by structure
36+
Content extraction provides foundational data that grounds the generative capabilities of Field Extraction, offering essential context about the input content. Users find content extraction invaluable for converting diverse data formats into a structured format. This capability excels in scenarios requiring:
37+
38+
* Document digitization, indexing, and retrieval by structure
3839
* Audio/video transcription
3940
* Metadata generation at scale
4041

41-
Content Understanding enhances its core extraction capabilities through optional add-on features that provide deeper content analysis. These add-ons can extract additional elements like layout information, speaker roles and face grouping. While some add-ons may incur additional costs, they can be selectively enabled based on your specific requirements to optimize both functionality and cost-efficiency. The modular nature of these add-ons allows for customized processing pipelines tailored to your use case.
42+
Content Understanding enhances its core extraction capabilities through optional add-on features that provide deeper content analysis. These add-ons can extract ancillary elements like layout information, speaker roles, and face grouping. While some add-ons can incur added costs, they can be selectively enabled based on your specific requirements to optimize both functionality and cost-efficiency. The modular nature of these add-on features allows for customized processing pipelines tailored to your use case.
4243

43-
The following section details the content extraction capabilities and optional add-on features available for each supported modality. Select your target modality from the tabs below to view its specific capabilities.
44+
The following section details the content extraction capabilities and optional add-on features available for each supported modality. Select your target modality from the following tabs and view its specific capabilities.
4445

4546
# [Document](#tab/document)
4647

4748
|Content Extraction|Add-on Capabilities|
4849
|-------------|-------------|
49-
|&bullet; **Optical Character Recognition (OCR)**: Extract printed and handwritten text from documents in various file formats, converting it into structured data. </br>| &bullet; **Layout**:Extracts layout information such as paragraphs, sections, tables, and more.. </br> &bullet; **Barcode**: Identifies and decodes all barcodes in the documents. </br> &bullet; **Formula**: Recognizes all identified mathematical equations from the documents. </br> |
50+
|&bullet; **`Optical Character Recognition (OCR)`**: Extract printed and handwritten text from documents in various file formats, converting it into structured data. </br>| &bullet; **`Layout`**: Extracts layout information such as paragraphs, sections, and tables</br> &bullet; **`Barcode`**: Identifies and decodes all barcodes in the documents.</br> &bullet; **`Formula`**: Recognizes all identified mathematical equations from the documents. </br> |
5051

5152
# [Image](#tab/image)
5253
> [!NOTE]
53-
> Content extraction for images is currently not supported. At present, the Image modality supports field extraction capabilities only.
54+
> Content extraction for images is currently not fully supported. The image modality currently supports field extraction capabilities only.
5455
5556
# [Audio](#tab/audio)
5657

5758
|Content Extraction|Add-on Capabilities|
5859
|-------------|-------------|
59-
|&bullet; **Transcription**:Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request. </br> &bullet; **Diarization**: Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers. </br> &bullet; **Language detection**: Automatically detects the language spoken in the audio to be processed.</br>| &bullet; **Speaker role detection**: Identifies speaker roles based on diarization results and replaces generic labels like "Speaker 1" with specific role names, such as "Agent" or "Customer." </br>|
60+
|&bullet; **`Transcription`**: Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.</br> &bullet; **`Diarization`**: Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers. </br> &bullet; **`Language detection`**: Automatically detects the language spoken in the audio to be processed.</br>| &bullet; **`Speaker role detection`**: Identifies speaker roles based on diarization results and replaces generic labels like "Speaker 1" with specific role names, such as "Agent" or "Customer." </br>|
6061

6162
# [Video](#tab/video)
6263

6364
|Content Extraction|Add-on Capabilities|
6465
|-------------|-------------|
65-
|&bullet; **Transcription**: Converts speech to structured, searchable text via Azure AI Speech, allowing users to specify recognition languages. </br>&bullet; **Shot Detection**: Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and repackaging of content with breaks exactly on shot boundaries. </br> &bullet; **Key Frame Extraction**: Extracts key frames from videos to represent each shot completely, ensuring each shot has enough key frames to enable Field Extraction to work effectively.</br> | &bullet; **Face Grouping**: Grouped faces appearing in a video to extract one representative face image for each person and provides segments where each one is present. The grouped face data is available as metadata and can be used to generate customized metadata fields.This feature is limited access and involves face identification and grouping; customers need to register for access at Face Recognition. |
66+
|&bullet; **`Transcription`**: Converts speech to structured, searchable text via Azure AI Speech, allowing users to specify recognition languages. </br>&bullet; **`Shot Detection`**: Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and repackaging of content with breaks exactly on shot boundaries.</br> &bullet; **`Key Frame Extraction`**: Extracts key frames from videos to represent each shot completely, ensuring each shot has enough key frames to enable Field Extraction to work effectively.</br> | &bullet; **`Face Grouping`**: Grouped faces appearing in a video to extract one representative face image for each person and provides segments where each one is present. The grouped face data is available as metadata and can be used to generate customized metadata fields. This feature is limited access and involves face identification and grouping; customers need to register for access at Face Recognition. |
6667

6768
----
6869
### Field Extraction
69-
Field extraction in Content Understanding leverages generative AI models to define schemas that extract, infer, or abstract information from various data types into structured outputs. This capability is powerful because by defining schemas with natural language field descriptions it eliminates the need for complex prompt engineering, making it accessible for users to create standardized outputs.
70+
Field extraction in Content Understanding uses generative AI models to define schemas that extract, infer, or abstract information from various data types into structured outputs. This capability is powerful because by defining schemas with natural language field descriptions it eliminates the need for complex prompt engineering, making it accessible for users to create standardized outputs.
71+
72+
Field extraction is optimized for scenarios requiring:
7073

71-
Field extraction is particularly optimized for scenarios requiring:
7274
* Consistent metadata extraction across content types
7375
* Workflow automation with structured output
74-
* Compliance monitoring and validation
76+
* Compliance monitoring and validation
7577

76-
The value lies in its ability to handle multiple content types (text, audio, video, images) while maintaining accuracy and scalability through AI-powered schema extraction and confidence scoring.
78+
The value lies in its ability to handle multiple content types (text, audio, video, images) while maintaining accuracy and scalability through AI-powered schema extraction and confidence scoring.
7779

78-
Each modality supports specific generation approaches optimized for that content type. Review the tabs below to understand the generation capabilities and methods available for your target modality.
80+
Each modality supports specific generation approaches optimized for that content type. Review the following tabs to understand the generation capabilities and methods available for your target modality.
7981

8082
# [Document](#tab/document)
8183

8284
|Supported generation methods|
8385
|--------------|
8486
|&bullet; **Extract**: In document, users can extract field values from input content, such as dates from receipts or item details from invoices. |
8587

86-
:::image type="content" source="../media/capabilities/documentextraction.gif" alt-text="Illustration of Document extraction method workflow.":::
88+
:::image type="content" source="../media/capabilities/document-extraction.gif" alt-text="Illustration of Document extraction method workflow.":::
8789

8890
# [Image](#tab/image)
8991

9092
|Supported generation methods|
9193
|--------------|
9294
|&bullet; **Generate**: In images, users can derive values from the input content, such as generating titles, descriptions, and summaries for figures and charts. <br> &bullet; **Classify**: In images, users can categorize elements from the input content, such as identifying different types of charts like histograms, bar graphs, etc.<br> |
9395

94-
:::image type="content" source="../media/capabilities/chartanalysis.gif" alt-text="Illustration of Image Generation and Classification workflow.":::
96+
:::image type="content" source="../media/capabilities/chart-analysis.gif" alt-text="Illustration of Image Generation and Classification workflow.":::
9597

9698
# [Audio](#tab/audio)
9799

98100
|Supported generation methods|
99101
|--------------|
100102
|&bullet; **Generate**: In audio, users can derive values from the input content, such as conversation summaries and topics. <br> &bullet; **Classify**: In audio, users can categorize values from the input content, such as determining the sentiment of a conversation (positive, neutral, or negative).<br> |
101103

102-
:::image type="content" source="../media/capabilities/audioanalysis.gif" alt-text="Illustration of Audio Generation and Classification workflow.":::
104+
:::image type="content" source="../media/capabilities/audio-analysis.gif" alt-text="Illustration of Audio Generation and Classification workflow.":::
103105

104106
# [Video](#tab/video)
105107

@@ -116,7 +118,7 @@ Follow our quickstart guide [to build your first schema](../quickstart/use-ai-fo
116118

117119
#### Grounding and Confidence Scores
118120

119-
Content Understanding ensures that the results from field and content extraction are accurately grounded to the input content and provide confidence scores for the extracted data, making automation and validation more reliable.
121+
Content Understanding ensures that the results from field and content extraction are precisely aligned with the input content. It also provides confidence scores for the extracted data, enhancing the reliability of automation and validation processes.
120122

121123
### Analyzers
122124

@@ -133,15 +135,15 @@ Key benefits of analyzers include:
133135

134136
* **Reusability**: A single analyzer can be reused across multiple workflows and applications, reducing development overhead.
135137

136-
* **Customization**: While starting with prebuilt templates, analyzers can be fully customized to match your specific business requirements and use cases.
138+
* **Customization**: Start with prebuilt templates. You can then enhance their functionality with analyzers that can be fully customized to match your specific business requirements and use cases.
137139

138140
For example, you might create an analyzer for processing customer service calls that combines audio transcription (content extraction) with sentiment analysis and topic classification (field extraction). This analyzer can then consistently process thousands of calls, providing structured insights for your customer experience analytics.
139141

140142
Follow our quickstart guide to [build your first analyzer](../quickstart/use-ai-foundry.md#analyzer-templates).
141143

142144
### Best Practices
143145

144-
For guidance on optimizing your Content Understanding implementations, including schema design tips, see our detailed [Best practices guide](../best-practices.md). This guide helps you maximize the value of Content Understanding while avoiding common pitfalls.
146+
For guidance on optimizing your Content Understanding implementations, including schema design tips, see our detailed [Best practices guide](best-practices.md). This guide helps you maximize the value of Content Understanding while avoiding common pitfalls.
145147

146148
### Input requirements
147149
For detailed information on supported input document formats, refer to our [Service quotas and limits](../service-limits.md) page.
@@ -153,6 +155,7 @@ For a detailed list of supported languages and regions, visit our [Language and
153155
Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
154156

155157
## Next steps
158+
156159
* Try processing your document content using Content Understanding in [Azure ](https://ai.azure.com/).
157160
* Learn to analyze content [**analyzer templates**](../quickstart/use-ai-foundry.md).
158161
* Review code sample: [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).
Binary file not shown.

articles/ai-services/content-understanding/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ items:
3838
items:
3939
- name: Overview
4040
displayName: content understanding capabilities, document, text, images, video, audio, visual, structured, content, field, extraction
41-
href: capabilities/overview.md
41+
href: concepts/capabilities-overview.md
4242
- name: Document
4343
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
4444
href: document/overview.md

0 commit comments

Comments
 (0)