You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/audio/overview.md
+16-15Lines changed: 16 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,11 +11,9 @@ ms.date: 05/06/2025
11
11
ms.custom: release-preview-2-cu
12
12
---
13
13
14
-
15
14
# Content Understanding audio solutions (preview)
16
15
17
16
> [!IMPORTANT]
18
-
>
19
17
> * Azure AI Content Understanding is available in preview. Public preview releases provide early access to features that are in active development.
20
18
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
21
19
> * For more information, *see*[**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
@@ -36,32 +34,35 @@ Content Understanding serves as a cornerstone for Speech Analytics solutions, en
36
34
37
35
### Content extraction
38
36
39
-
***Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
37
+
#### Language handling
38
+
We support different options to handle language processing during transcription.
40
39
41
-
> [!NOTE]
42
-
> Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
43
-
> For languages with Fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
40
+
The following table provides an overview of the options controlled via the 'locales' configuration:
|multiple locales|≤ 1GB and/or ≤ 4 hours|Single language transcription<br>based on Language Detection|All supported locales[^1]|• ≤ 300MB and/or ≤ 2 hours: Near-real-time<br>• > 300MB and >2hr ≤ 4 hours: Regular|
48
+
49
+
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
50
+
For languages with Fast transcriptions support and for files ≤ 300MB and/or ≤ 2 hours, transcription time is reduced substantially.
51
+
52
+
***Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
44
53
45
54
***Diarization**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
46
55
47
56
***Speaker role detection**. Identifies agent and customer roles within contact center call data.
48
57
49
58
***Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to 'auto'.
50
-
<br>The following locales are currently supported for multilingual transcription:
***Files larger than 300 MB and/or longer than 4 hours**: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN.
53
59
54
-
> [!WARNING]
60
+
> [!NOTE]
55
61
> When Multilingual transcription is used, a file with an unsupported locale produces a result. This result is based on the closest locale but most likely not correct.
56
62
> This result is a known behavior. Make sure to configure locales when not using Multilingual transcription!
57
63
58
-
59
64
***Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
60
65
61
-
> [!NOTE]
62
-
> For files larger than 300 MB and/or longer than 2 hours and locales unsupported by Fast transcription, the file is processed generating a multilingual transcript based on the specified locales.
63
-
> In case language detection fails, the first language/locale defined is used to transcribe the file.
64
-
65
66
### Field extraction
66
67
67
68
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.
0 commit comments