Skip to content

Commit fd96efc

Browse files
committed
Updating audio after docs bug bash. Fixing description of audio content extraction!
1 parent 39d7d44 commit fd96efc

File tree

1 file changed

+25
-20
lines changed
  • articles/ai-services/content-understanding/audio

1 file changed

+25
-20
lines changed

articles/ai-services/content-understanding/audio/overview.md

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@ title: Azure AI Content Understanding audio overview
33
titleSuffix: Azure AI services
44
description: Learn about Azure AI Content Understanding audio solutions
55
author: laujan
6-
ms.author: lajanuar
6+
ms.author: jagoerge
77
manager: nitinme
88
ms.service: azure-ai-content-understanding
99
ms.topic: overview
1010
ms.date: 05/19/2025
1111
---
1212

13-
# Content Understanding audio solutions (preview)
13+
# Azure AI Content Understanding audio solutions (preview)
1414

1515
> [!IMPORTANT]
1616
> * Azure AI Content Understanding is available in preview. Public preview releases provide early access to features that are in active development.
@@ -33,20 +33,22 @@ Content Understanding serves as a cornerstone for Speech Analytics solutions, en
3333

3434
### Content extraction
3535

36+
Audio content extraction is the process of transcribing audio files. This process includes separating transcriptions by speaker and can involve optional features like role detection to update speaker results to meaningful speaker roles. It can also involve detailed results including word-level timestamps.
37+
3638
#### Language handling
3739
We support different options to handle language processing during transcription.
3840

3941
The following table provides an overview of the options controlled via the 'locales' configuration:
4042

4143
|Locale setting|File size|Supported processing|Supported locales|Result latency|
4244
|--|--|--|--|--|
43-
|auto or empty|300MB and/or ≤ 2 hours|Multilingual transcription|de-DE, en-AU, en-CA, en-GB, en-IN, en-US, es-ES, es-MX, fr-CA, fr-FR, hi-IN, it-IT, ja-JP, ko-KR and zh-CN|Near-real-time|
44-
|auto or empty|> 300MB and >2hr ≤ 4 hours|Multilingual transcription|en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN|Regular|
45-
|single locale|1GB and/or ≤ 4 hours|Single language transcription|All supported locales[^1]|&bullet;300MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300MB and >2hr ≤ 4 hours: Regular|
46-
|multiple locales|1GB and/or ≤ 4 hours|Single language transcription<br>based on Language Detection|All supported locales[^1]|&bullet;300MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300MB and >2hr ≤ 4 hours: Regular|
45+
|**auto or empty**|300 MB and/or ≤ 2 hours|Multilingual transcription|`de-DE`, `en-AU`,` en-CA`, `en-GB`, `en-IN`, `en-US`, `es-ES`, `es-MX`, `fr-CA`, `fr-FR`, `hi-IN`, `it-IT`, `ja-JP`, `ko-KR`, and `zh-CN`|Near-real-time|
46+
|**auto or empty**|> 300 MB and >2 HR ≤ 4 hours|Multilingual transcription|`en-US`, `es-ES`, `es-MX`, `fr-FR`, `hi-IN`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`|Regular|
47+
|**single locale**|1 GB and/or ≤ 4 hours|Single language transcription|All supported locales[^1]|&bullet;300 MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300 MB and >2 HR ≤ 4 hours: Regular|
48+
|**multiple locales**|1 GB and/or ≤ 4 hours|Single language transcription (based on language detection)|All supported locales[^1]|&bullet;300 MB and/or ≤ 2 hours: Near-real-time<br>&bullet; > 300 MB and >2 HR ≤ 4 hours: Regular|
4749

48-
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support?tabs=stt).
49-
For languages with Fast transcriptions support and for files ≤ 300MB and/or ≤ 2 hours, transcription time is reduced substantially.
50+
[^1]: Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support.md).
51+
For languages with Fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
5052

5153
* **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
5254

@@ -57,16 +59,17 @@ For languages with Fast transcriptions support and for files ≤ 300MB and/or
5759
* **Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
5860

5961
> [!NOTE]
60-
> When Multilingual transcription is used, a file with an unsupported locale produces a result. This result is based on the closest locale but most likely not correct.
61-
> This result is a known behavior. Make sure to configure locales when not using Multilingual transcription!
62+
> When Multilingual transcription is used, any files with unsupported locales produce a result based on the closest supported locale, which is likely incorrect. This result is a known
63+
> behavior. Avoid transcription quality issues by ensuring that you configure locales when not using a multilingual transcription supported locale!
6264
6365
* **Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
6466

6567
### Field extraction
6668

6769
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.
6870

69-
## Key Benefits
71+
## Key benefits
72+
7073
Advanced audio capabilities, including:
7174

7275
* **Customizable data extraction**. Tailor the output to your specific needs by modifying the field schema, allowing for precise data generation and extraction.
@@ -77,7 +80,7 @@ Advanced audio capabilities, including:
7780

7881
* **Scenario adaptability**. Adapt the service to your requirements by generating custom fields and extract relevant data.
7982

80-
## Prebuild audio analyzers
83+
## Prebuilt audio analyzers
8184

8285
The prebuilt analyzers allow extracting valuable insights into audio content without the need to create an analyzer setup.
8386

@@ -87,7 +90,7 @@ All audio analyzers generate transcripts in standard WEBVTT format separated by
8790
>
8891
> Prebuilt analyzers are set to use multilingual transcription and `returnDetails` enabled.
8992
90-
The following prebuild analyzers are available:
93+
The following prebuilt analyzers are available:
9194

9295
**Post-call analysis (prebuilt-callCenter)**. Analyze call recordings to generate:
9396

@@ -279,19 +282,21 @@ Capabilities such as topic modeling, key phrase extraction, speech-to-text trans
279282
Analysts working with large volumes of conversational data can use this solution to extract insights through natural language interaction. It supports tasks like identifying customer support trends, improving contact center quality, and uncovering operational intelligence—enabling teams to spot patterns, act on feedback, and make informed decisions faster.
280283

281284
## Input requirements
282-
For a detailed list of supported audio formats, refer to our [Service limits and codecs](../service-limits.md) page.
285+
286+
For a detailed list of supported audio formats, *see* [Service limits and codecs](../service-limits.md).
283287

284288
## Supported languages and regions
285289

286-
For a complete list of supported regions, languages, and locales, see our [Language and region support](../language-region-support.md)) page.
290+
For a complete list of supported regions, languages, and locales, see [Language and region support](../language-region-support.md).
287291

288292
## Data privacy and security
289293

290-
Developers using this service should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
294+
Developers using this service should review Microsoft's policies on customer data. For more information, *see* [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy).
291295

292296
## Next steps
293297

294-
* Try processing your audio content in [**Azure AI Foundry portal**](https://aka.ms/cu-landing).
295-
* Learn how to analyze audio content [**analyzer templates**](../quickstart/use-ai-foundry.md).
296-
* Review code sample: [**audio content extraction**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/blob/main/notebooks/content_extraction.ipynb).
297-
* Review code sample: [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).
298+
* Try processing your audio content in the [**Azure AI Foundry portal**](https://aka.ms/cu-landing).
299+
* Learn how to analyze audio content with [**analyzer templates**](../quickstart/use-ai-foundry.md).
300+
* Review code samples:
301+
* [**audio content extraction**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/blob/main/notebooks/content_extraction.ipynb).
302+
* [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).

0 commit comments

Comments
 (0)