You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn about Azure AI Content Understanding audio solutions
5
5
author: laujan
6
-
ms.author: lajanuar
6
+
ms.author: jagoerge
7
7
manager: nitinme
8
8
ms.service: azure-ai-content-understanding
9
9
ms.topic: overview
10
10
ms.date: 05/19/2025
11
11
---
12
12
13
-
# Content Understanding audio solutions (preview)
13
+
# Azure AI Content Understanding audio solutions (preview)
14
14
15
15
> [!IMPORTANT]
16
16
> * Azure AI Content Understanding is available in preview. Public preview releases provide early access to features that are in active development.
@@ -33,20 +33,22 @@ Content Understanding serves as a cornerstone for Speech Analytics solutions, en
33
33
34
34
### Content extraction
35
35
36
+
Audio content extraction is the process of transcribing audio files. This process includes separating transcriptions by speaker and can involve optional features like role detection to update speaker results to meaningful speaker roles. It can also involve detailed results including word-level timestamps.
37
+
36
38
#### Language handling
37
39
We support different options to handle language processing during transcription.
38
40
39
41
The following table provides an overview of the options controlled via the 'locales' configuration:
|**multiple locales**|≤ 1 GB and/or ≤ 4 hours|Single language transcription (based on language detection)|All supported locales[^1]|• ≤ 300 MB and/or ≤ 2 hours: Near-real-time<br>• > 300 MB and >2 HR ≤ 4 hours: Regular|
47
49
48
-
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
49
-
For languages with Fast transcriptions support and for files ≤ 300MB and/or ≤ 2 hours, transcription time is reduced substantially.
50
+
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support.md).
51
+
For languages with Fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
50
52
51
53
***Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
52
54
@@ -57,16 +59,17 @@ For languages with Fast transcriptions support and for files ≤ 300MB and/or
57
59
***Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
58
60
59
61
> [!NOTE]
60
-
> When Multilingual transcription is used, a file with an unsupported locale produces a result. This result is based on the closest locale but most likely not correct.
61
-
> This result is a known behavior. Make sure to configure locales when not using Multilingual transcription!
62
+
> When Multilingual transcription is used, any files with unsupported locales produce a resultbased on the closest supported locale, which is likely incorrect. This result is a known
63
+
> behavior. Avoid transcription quality issues by ensuring that you configure locales when not using a multilingual transcription supported locale!
62
64
63
65
***Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
64
66
65
67
### Field extraction
66
68
67
69
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.
68
70
69
-
## Key Benefits
71
+
## Key benefits
72
+
70
73
Advanced audio capabilities, including:
71
74
72
75
***Customizable data extraction**. Tailor the output to your specific needs by modifying the field schema, allowing for precise data generation and extraction.
***Scenario adaptability**. Adapt the service to your requirements by generating custom fields and extract relevant data.
79
82
80
-
## Prebuild audio analyzers
83
+
## Prebuilt audio analyzers
81
84
82
85
The prebuilt analyzers allow extracting valuable insights into audio content without the need to create an analyzer setup.
83
86
@@ -87,7 +90,7 @@ All audio analyzers generate transcripts in standard WEBVTT format separated by
87
90
>
88
91
> Prebuilt analyzers are set to use multilingual transcription and `returnDetails` enabled.
89
92
90
-
The following prebuild analyzers are available:
93
+
The following prebuilt analyzers are available:
91
94
92
95
**Post-call analysis (prebuilt-callCenter)**. Analyze call recordings to generate:
93
96
@@ -279,19 +282,21 @@ Capabilities such as topic modeling, key phrase extraction, speech-to-text trans
279
282
Analysts working with large volumes of conversational data can use this solution to extract insights through natural language interaction. It supports tasks like identifying customer support trends, improving contact center quality, and uncovering operational intelligence—enabling teams to spot patterns, act on feedback, and make informed decisions faster.
280
283
281
284
## Input requirements
282
-
For a detailed list of supported audio formats, refer to our [Service limits and codecs](../service-limits.md) page.
285
+
286
+
For a detailed list of supported audio formats, *see*[Service limits and codecs](../service-limits.md).
283
287
284
288
## Supported languages and regions
285
289
286
-
For a complete list of supported regions, languages, and locales, see our [Language and region support](../language-region-support.md)) page.
290
+
For a complete list of supported regions, languages, and locales, see [Language and region support](../language-region-support.md).
287
291
288
292
## Data privacy and security
289
293
290
-
Developers using this service should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
294
+
Developers using this service should review Microsoft's policies on customer data. For more information, *see*[Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy).
291
295
292
296
## Next steps
293
297
294
-
* Try processing your audio content in [**Azure AI Foundry portal**](https://aka.ms/cu-landing).
295
-
* Learn how to analyze audio content [**analyzer templates**](../quickstart/use-ai-foundry.md).
0 commit comments