Skip to content

Commit 362d154

Browse files
committed
Revampd transcription modes for clarity
1 parent 50f80a8 commit 362d154

File tree

1 file changed

+16
-15
lines changed
  • articles/ai-services/content-understanding/audio

1 file changed

+16
-15
lines changed

articles/ai-services/content-understanding/audio/overview.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,9 @@ ms.date: 05/06/2025
1111
ms.custom: release-preview-2-cu
1212
---
1313

14-
1514
# Content Understanding audio solutions (preview)
1615

1716
> [!IMPORTANT]
18-
>
1917
> * Azure AI Content Understanding is available in preview. Public preview releases provide early access to features that are in active development.
2018
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
2119
> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
@@ -36,32 +34,35 @@ Content Understanding serves as a cornerstone for Speech Analytics solutions, en
3634

3735
### Content extraction
3836

39-
* **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
37+
#### Language handling
38+
We support different options to handle language processing during transcription.
4039

41-
> [!NOTE]
42-
> Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
43-
> For languages with Fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
40+
The following table provides an overview of the options controlled via the 'locales' configuration:
41+
42+
|Locale setting|File size|Supported processing|Supported locales|Result latency|
43+
|--|--|--|--|--|
44+
|auto or empty|≤ 300MB and/or ≤ 2 hours|Multilingual transcription|de-DE, en-AU, en-CA, en-GB, en-IN, en-US, es-ES, es-MX, fr-CA, fr-FR, hi-IN, it-IT, ja-JP, ko-KR and zh-CN|Near-real-time|
45+
|auto or empty|> 300MB and >2hr ≤ 4 hours|Multilingual transcription|en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN|Regular|
46+
|single locale|≤ 1GB and/or ≤ 4 hours|Single language transcription|All supported locales[^1]|&bullet; ≤ 300MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300MB and >2hr ≤ 4 hours: Regular|
47+
|multiple locales|≤ 1GB and/or ≤ 4 hours|Single language transcription<br>based on Language Detection|All supported locales[^1]|&bullet; ≤ 300MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300MB and >2hr ≤ 4 hours: Regular|
48+
49+
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
50+
For languages with Fast transcriptions support and for files ≤ 300MB and/or ≤ 2 hours, transcription time is reduced substantially.
51+
52+
* **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
4453

4554
* **Diarization**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
4655

4756
* **Speaker role detection**. Identifies agent and customer roles within contact center call data.
4857

4958
* **Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to 'auto'.
50-
<br>The following locales are currently supported for multilingual transcription:
51-
* **Files ≤ 300 MB and/or ≤ 2 hours**: de-DE, en-AU, en-CA, en-GB, en-IN, en-US, es-ES, es-MX, fr-CA, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, and zh-CN.
52-
* **Files larger than 300 MB and/or longer than 4 hours**: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN.
5359

54-
> [!WARNING]
60+
> [!NOTE]
5561
> When Multilingual transcription is used, a file with an unsupported locale produces a result. This result is based on the closest locale but most likely not correct.
5662
> This result is a known behavior. Make sure to configure locales when not using Multilingual transcription!
5763
58-
5964
* **Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
6065

61-
> [!NOTE]
62-
> For files larger than 300 MB and/or longer than 2 hours and locales unsupported by Fast transcription, the file is processed generating a multilingual transcript based on the specified locales.
63-
> In case language detection fails, the first language/locale defined is used to transcribe the file.
64-
6566
### Field extraction
6667

6768
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.

0 commit comments

Comments
 (0)