Skip to content

Commit c3040c5

Browse files
Merge pull request #4856 from laujan/jan-4727-overview-updates
Jan 4727 overview updates
2 parents 5161182 + 1bbad22 commit c3040c5

File tree

2 files changed

+47
-32
lines changed

2 files changed

+47
-32
lines changed

articles/ai-services/content-understanding/audio/overview.md

Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -3,44 +3,54 @@ title: Azure AI Content Understanding audio overview
33
titleSuffix: Azure AI services
44
description: Learn about Azure AI Content Understanding audio solutions
55
author: laujan
6-
ms.author: jagoerge
6+
ms.author: jagoerge
77
manager: nitinme
88
ms.service: azure-ai-content-understanding
99
ms.topic: overview
1010
ms.date: 05/19/2025
1111
---
1212

13-
1413
# Content Understanding audio solutions (preview)
1514

1615
> [!IMPORTANT]
17-
>
1816
> * Azure AI Content Understanding is available in preview. Public preview releases provide early access to features that are in active development.
1917
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
2018
> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
2119
22-
Content Understanding audio analyzers enable transcription and diarization of conversational audio, extracting structured fields such as summaries, sentiments, and key topics. Customize an audio analyzer template to your business needs using [Azure AI Foundry portal](https://ai.azure.com/) to start generating results.
20+
Audio analyzers enable transcription and diarization of conversational audio, extracting structured fields such as summaries, sentiments, and key topics. Customize an audio analyzer template to your business needs using [Azure AI Foundry portal](https://ai.azure.com/) to start generating results.
2321

24-
Here are common scenarios for using Content Understanding with conversational audio data:
22+
Here are common scenarios for conversational audio data processing:
2523

2624
* Gain customer insights through summarization and sentiment analysis.
2725
* Assess and verify call quality and compliance in call centers.
2826
* Create automated summaries and metadata for podcast publishing.
2927

3028
## Audio analyzer capabilities
3129

32-
:::image type="content" source="../media/audio/overview/workflow-diagram.png" lightbox="../media/audio/overview/workflow-diagram.png" alt-text="Illustration of Content Understanding audio workflow.":::
30+
:::image type="content" source="../media/audio/overview/workflow-diagram-preview.png" lightbox="../media/audio/overview/workflow-diagram-preview.png" alt-text="Illustration of Content Understanding audio capabilities.":::
3331

34-
Content Understanding serves as a cornerstone for Media Asset Management solutions, enabling the following capabilities for audio files:
32+
Content Understanding serves as a cornerstone for Speech Analytics solutions, enabling the following capabilities for audio files:
3533

3634
### Content extraction
3735

38-
* **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
36+
Audio content extraction is the process of isolating and retrieving specific elements or features from an audio file. This process can include separating individual audio sources; identifying specific segments within a sound file; or detecting and categorizing various characteristics of the audio content.
3937

40-
> [!NOTE]
41-
>
42-
> Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support.md).
43-
> For languages with fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
38+
#### Language handling
39+
We support different options to handle language processing during transcription.
40+
41+
The following table provides an overview of the options controlled via the 'locales' configuration:
42+
43+
|Locale setting|File size|Supported processing|Supported locales|Result latency|
44+
|--|--|--|--|--|
45+
|**auto or empty**|≤ 300 MB and/or ≤ 2 hours|Multilingual transcription|`de-DE`, `en-AU`,` en-CA`, `en-GB`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `hi-IN`, `it-IT`, `ja-JP`, `ko-KR`, and `zh-CN`|Near-real-time|
46+
|**auto or empty**|> 300 MB and >2 HR ≤ 4 hours|Multilingual transcription|`en-US`, `es-ES`, `es-MX`, `fr-FR`, `hi-IN`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`|Regular|
47+
|**single locale**|≤ 1 GB and/or ≤ 4 hours|Single language transcription|All supported locales[^1]|&bullet; ≤ 300 MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300 MB and >2 HR ≤ 4 hours: Regular|
48+
|**multiple locales**|≤ 1 GB and/or ≤ 4 hours|Single language transcription (based on language detection)|All supported locales[^1]|&bullet; ≤ 300 MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300 MB and >2 HR ≤ 4 hours: Regular|
49+
50+
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support.md).
51+
For languages with Fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
52+
53+
* **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
4454

4555
* **Diarization**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
4656

@@ -49,24 +59,18 @@ Content Understanding serves as a cornerstone for Media Asset Management solutio
4959
* **Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
5060

5161
> [!NOTE]
52-
>
53-
> The following locales are currently supported:
54-
> * **Files ≤ 300 MB and/or ≤ 2 hours**: de-DE, en-AU, en-CA, en-GB, en-IN, en-US, es-ES, es-MX, fr-CA, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, and zh-CN.
55-
> * **Files larger than 300 MB and/or longer than 4 hours**: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN.
62+
> When Multilingual transcription is used, a file with an unsupported locale produces a result. This result is based on the closest locale but most likely not correct.
63+
> This result is a known behavior. Make sure to configure locales when not using Multilingual transcription!
5664
5765
* **Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
5866

59-
> [!NOTE]
60-
>
61-
> For files larger than 300 MB and/or longer than 2 hours and locales unsupported by Fast transcription, the file is processed generating a multilingual transcript based on the specified locales.
62-
> In case language detection fails, the first language/locale defined is used to transcribe the file.
63-
6467
### Field extraction
6568

6669
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.
6770

68-
## Key Benefits
69-
Content Understanding offers advanced audio capabilities, including:
71+
## Key benefits
72+
73+
Advanced audio capabilities, including:
7074

7175
* **Customizable data extraction**. Tailor the output to your specific needs by modifying the field schema, allowing for precise data generation and extraction.
7276

@@ -76,7 +80,7 @@ Content Understanding offers advanced audio capabilities, including:
7680

7781
* **Scenario adaptability**. Adapt the service to your requirements by generating custom fields and extract relevant data.
7882

79-
## Content Understanding prebuilt audio analyzers
83+
## Prebuilt audio analyzers
8084

8185
The prebuilt analyzers allow extracting valuable insights into audio content without the need to create an analyzer setup.
8286

@@ -86,7 +90,7 @@ All audio analyzers generate transcripts in standard WEBVTT format separated by
8690
>
8791
> Prebuilt analyzers are set to use multilingual transcription and `returnDetails` enabled.
8892
89-
Content Understanding offers the following prebuilt analyzers:
93+
The following prebuilt analyzers are available:
9094

9195
**Post-call analysis (prebuilt-callCenter)**. Analyze call recordings to generate:
9296

@@ -268,20 +272,31 @@ You can also customize prebuilt analyzers for more fine-grained control of the o
268272
* Control the language of the field extraction output.
269273
* Configure the transcription behavior.
270274

275+
## Conversational Knowledge Mining Solution Accelerator
276+
For an end-2-end quickstart for Speech Analytics solutions, refer to the [Conversation knowledge mining solution accelerator](https://aka.ms/Conversational-Knowledge-Mining).
277+
278+
Gain actionable insights from large volumes of conversational data by identifying key themes, patterns, and relationships. By using Azure AI Foundry, Azure AI Content Understanding, Azure OpenAI Service, and Azure AI Search, this solution analyzes unstructured dialogue and maps it to meaningful, structured insights.
279+
280+
Capabilities such as topic modeling, key phrase extraction, speech-to-text transcription, and interactive chat enable users to explore data naturally and make faster, more informed decisions.
281+
282+
Analysts working with large volumes of conversational data can use this solution to extract insights through natural language interaction. It supports tasks like identifying customer support trends, improving contact center quality, and uncovering operational intelligence—enabling teams to spot patterns, act on feedback, and make informed decisions faster.
283+
271284
## Input requirements
272-
For a detailed list of supported audio formats, refer to our [Service limits and codecs](../service-limits.md) page.
285+
286+
For a detailed list of supported audio formats, *see* [Service limits and codecs](../service-limits.md).
273287

274288
## Supported languages and regions
275289

276-
For a complete list of supported regions, languages, and locales, see our [Language and region support](../language-region-support.md)) page.
290+
For a complete list of supported regions, languages, and locales, see [Language and region support](../language-region-support.md).
277291

278292
## Data privacy and security
279293

280-
Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
294+
Developers using this service should review Microsoft's policies on customer data. For more information, *see* [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy).
281295

282296
## Next steps
283297

284-
* Try processing your audio content using Content Understanding in [**Azure AI Foundry portal**](https://aka.ms/cu-landing).
285-
* Learn how to analyze audio content [**analyzer templates**](../quickstart/use-ai-foundry.md).
286-
* Review code sample: [**audio content extraction**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/blob/main/notebooks/content_extraction.ipynb).
287-
* Review code sample: [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).
298+
* Try processing your audio content in the [**Azure AI Foundry portal**](https://aka.ms/cu-landing).
299+
* Learn how to analyze audio content with [**analyzer templates**](../quickstart/use-ai-foundry.md).
300+
* Review code samples:
301+
* [**audio content extraction**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/blob/main/notebooks/content_extraction.ipynb).
302+
* [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).

0 commit comments

Comments
 (0)