You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> * Features, approaches, and processes may change or have constrained capabilities, prior to General Availability (GA).
21
21
> * For more information, *see*[**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
22
22
23
-
Content Understanding audio capabilities enable you to transcribe and diarize conversational audio. It can generate enhanced outputs like summaries, special industry record formats, captioning data. Content Understanding audio and audio capabilities enable you to extract valuable information such as key topics, sentiment, and more. To get started, use one of the provided out-of-box prebuilt extraction schemas and start generating results. You can also customize Content Understanding capabilities to meet your business needs as necessary.
23
+
Content Understanding audio analyzers enable transcription and diarization of conversational audio, extracting structured fields such as summaries, sentiments, and key topics. Customize an audio analyzer template to your business needs using [Azure AI Foundry](https://ai.azure.com/)to start generating results.
24
24
25
-
Here are some of the common scenarios for Content Understanding extracted conversational audio data:
25
+
Here are common scenarios for using Content Understanding with conversational audio data:
26
26
27
-
* Get customer insights through summarization and sentiment.
27
+
* Gain customer insights through summarization and sentiment analysis.
28
+
* Assess and verify call quality and compliance in call centers.
29
+
* Create automated summaries and metadata for podcast publishing.
28
30
29
-
* Generate contact center call analytics results.
31
+
## Audio analyzer capabilities
30
32
31
-
* Assess and verify contact center call quality and compliance for improved processing coverage.
33
+
:::image type="content" source="../media/audio/overview/workflow-diagram.png" lightbox="../media/audio/overview/workflow-diagram.png" alt-text="Illustration of Content Understanding audio workflow.":::
32
34
33
-
* Generate automated summaries and metadata for podcast platform publishing.
35
+
Content Understanding serves as a cornerstone for Media Asset Management solutions, enabling the following capabilities for audio files:
36
+
37
+
### Content extraction
34
38
35
-
*Create a redacted version of the transcript with personal data removed.
39
+
***Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
36
40
37
-
*Analyze recordings to find valuable information like most desired topics.
41
+
***`Diarization`**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
38
42
39
-
*Generate rich outputs based on conversational audio such as dictated documents.
43
+
***Speaker role detection**. Identifies agent and customer roles within contact center call data.
40
44
41
-
## Content Understanding in AI Studio
45
+
***Language detection**. Automatically detects the language in the audio or uses specified language/locale hints.
42
46
43
-
AI studio enables you to set up, test, and manage Content Understanding solutions. You can use prebuilt schemas that can be customized to analyze your audio transcripts to easily generate results matching your specific business needs. A typical scenario is to automatically process files uploaded into a blob storage account and write the analytics results back to it. Based on the single file analysis, you can then easily index and add these results to a database or an Azure AI Search Index to easily generate more cross-recording insights and dashboards.
47
+
### Field extraction
44
48
45
-
* Get insights from audio recordings of meetings, calls, and conversations. Review insights from summaries, sentiment results, action items, meeting notes, and `PII` redacted transcripts.
49
+
Field extraction allows you to extract structured data from audio files, such as summaries, sentiments, and mentioned entities from call logs. You can begin by customizing a suggested analyzer template or creating one from scratch.
46
50
47
-
* Customize the results according to your specific needs and scenarios to modify the output of the workflow.
48
-
49
-
* Test and deploy customized workflows easily and quickly, without having to write any code or use any external tools.
50
-
51
-
* Access and manage your Content Understanding projects and resources in one place, along with other AI services that you use in AI Studio.
52
-
53
-
You can use the AI Studio to manage audio analytics projects and resources.
54
-
55
-
* Content Understanding in AI studio offers a user-friendly interface and a seamless setup experience to generate insights from audio data. You can also test and deploy different versions of the output schema directly in AI studio.
56
-
57
-
* Developers can use the `SDK`s and `REST API`s to process data at scale in production and integrate Content Understanding into Azure Pipelines as needed.
58
-
59
-
## Content Understanding features for audio processing
60
-
61
-
Content Understanding is serves as a cornerstone for Media Asset Management solutions and enables the following capabilities for audio files:
62
-
63
-
***Extracting content**:
64
-
65
-
***Transcription**. Convert audio within conversational audio files into text-based transcripts that can be searched and analyzed. This transcription data is also used as grounding for generating customizable fields.
66
-
67
-
***Diarization**. Speaker diarization distinguishes between the speakers participating in a conversation. The Content Understanding service provides information about which part of a transcribed conversation is attributed to a particular speaker.
68
-
69
-
***Speaker Role Detection**. Detect and identify agent and customer speaker roles within contact center call data.
70
-
71
-
***Supported languages**. Content Understanding audio capabilities support automatic language detection for [**supported languages**](../language-region-support.md#language-support). The feature is automatically active if no locale or multiple locales are selected.
72
-
73
-
***Supported audio formats**. Content Understanding audio capabilities support a broad variety of [audio file formats and codes](../language-region-support.md).
74
-
75
-
***Audio transcription detailed output**. The complete output from the audio transcription process is returned including, if needed, sentence-level and word-level-timestamps.
76
-
77
-
***Generating fields**:
78
-
79
-
***Field generation**. Content Understanding enables you to define custom fields and extract and generate data from your audio by including them in the schema definition.
80
-
81
-
***Multi-language results**. Content Understanding can generate field schema results in multiple languages when you include a field description in the desired output language.
82
-
83
-
***Support for `generate` and `classify` methods for field extraction**. Customize your output formats using user-specified extraction methods.
84
-
85
-
## Content Understanding audio workflow
86
-
87
-
The following diagram provides a high-level overview of a typical Content Understanding Audio processing workflow.
88
-
89
-
:::image type="content" source="../media/audio/overview/workflow-diagram.png" lightbox="../media/audio/overview/workflow-diagram.png" alt-text="Illustration of Content Understanding audio workflow.":::
90
-
91
-
A typical Content Understanding Audio workflow consists of the following steps:
92
-
93
-
1. You send audio or transcription files to the Content UnderstandingAPI wither as single file or providing settings to process from a connected blob storage account.
1. Content UnderstandingContent Extraction generates a conversation transcript incl. speaker separation in webVTT format and optionally recognizes speaker roles or names to replace generic 'Speaker n' results.
54
+
***Customizable data extraction**. Tailor the output to your specific needs by modifying the field schema, allowing for precise data generation and extraction.
96
55
97
-
1. The Content UnderstandingField Extraction then generates added insights based on the generated conversation transcript.
56
+
***Generative models**. Utilize generative AI models to specify in natural language the content you want to extract, and the service generates the desired output.
98
57
99
-
1. The Content Understanding service returns an audio file results containing the conversation transcript including added generated insights in JSON format. The results are either directly returned from the API or can be written into a connected blob storage account.
58
+
***Integrated pre-processing**. Benefit from built-in preprocessing steps like transcription, diarization, and role detection, providing rich context for generative models.
100
59
101
-
## Content Understanding prebuilt audio scenarios
60
+
***Scenario adaptability**. Adapt the service to your requirements by generating custom fields and extract relevant data.
102
61
103
-
Content Understanding provides the following customizable prebuilt scenario templates:
62
+
## Content Understanding audio analyzer templates
104
63
105
-
***Post call analytics**. Analyze call recordings and generate outputs such as conversation transcript, call summary, sentiment assessment and more.
***Conversation summarization**. Generate transcriptions from conversation audio recordings, generate a summary, and assess sentiment.
66
+
***Post-call analytics**. Analyze call recordings to generate conversation transcripts, call summaries, sentiment assessments, and more.
108
67
109
-
You can start with any prebuilt scenario or start from scratch to get started and customize as needed to meet your business needs.
68
+
***Conversation summarization**. Generate transcriptions, summaries, and sentiment assessments from conversation audio recordings.
110
69
111
-
## Audio format support and input requirements
70
+
Start with a template or create a custom analyzer to meet your specific business needs.
112
71
113
-
For a complete list of Content Understanding supported audio formats, *see* our [Service limits and codecs](../service-limits.md) page.
72
+
## Input requirements
73
+
For a detailed list of supported audio formats, refer to our [Service limits and codecs](../service-limits.md) page.
114
74
115
-
## Supported regions, languages, and locales
75
+
## Supported languages and regions
116
76
117
77
For a complete list of supported regions, languages, and locales, see our [Language and region support](../language-region-support.md)) page.
118
78
119
-
120
-
## Content Understanding audio capability limits
121
-
122
-
|Attribute|Limit|
123
-
|-----|-----|
124
-
|Time|Maximum of 2 hours in length|
125
-
|Size|Maximum of 200 MB in size|
126
-
|Speakers|Maximum number of 36 speakers|
127
-
128
-
## Key Benefits
129
-
130
-
Content Understanding provides a specific set of capabilities for audio including:
131
-
132
-
***Highly customizable data extraction**. Unlike traditional audio analysis services, Content Understanding allows you to customize the data you want to generate or extract. By modifying the schema, you can tailor the output to match your specific use cases.
133
-
134
-
***Generative Models**. You can use our generative AI models to describe in natural language what content you want to extract, and the service generates that output.
135
-
136
-
***Integrated Pre-processing**. The service performs several preprocessing steps, such as transcription, diarization and role detection, to provide rich context to the generative models.
137
-
138
-
***Scenarios adaptability**. The service can adapt to your needs by generating custom fields to extract the right data.
139
-
140
79
## Data privacy and security
141
80
142
-
As with all the Azure AI services, developers using the Content Understanding service should be aware of Microsoft's policies on customer data. See our [**Data, protection and privacy**](https://www.microsoft.com/trust-center/privacy) page to learn more.
81
+
Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
143
82
144
83
## Next steps
145
84
146
-
To get started using Content Understanding audio capabilities, try our [post-call analytics prebuilt scenario template](../prebuilt-template/post-call-analytics.md).
147
-
85
+
* Try processing your audio content using Content Understanding in [Azure AI Foundry](https://ai.azure.com/).
86
+
* Learn more about audio [**analyzer templates**](../quickstart/use-ai-foundry.md).
0 commit comments