Merge pull request #4670 from laujan/jan-4592-audio-overview

v-dirichards · web-flow · commit 95a2fbb406e4 · 2025-05-09T12:14:36.000-05:00
Jan 4592 audio overview
diff --git a/articles/ai-services/content-understanding/audio/overview.md b/articles/ai-services/content-understanding/audio/overview.md
@@ -3,12 +3,11 @@ title: Azure AI Content Understanding audio overview
 titleSuffix: Azure AI services
 description: Learn about Azure AI Content Understanding audio solutions
 author: laujan
-ms.author: lajanuar
+ms.author: jagoerge
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 03/18/2025
-ms.custom: ignite-2024-understanding-release
+ms.date: 05/19/2025
 ---
 
 
@@ -33,16 +32,34 @@ Here are common scenarios for using Content Understanding with conversational au
 :::image type="content" source="../media/audio/overview/workflow-diagram.png" lightbox="../media/audio/overview/workflow-diagram.png" alt-text="Illustration of Content Understanding audio workflow.":::
 
 Content Understanding serves as a cornerstone for Media Asset Management solutions, enabling the following capabilities for audio files:
-  
+
 ### Content extraction
 
 * **Transcription**. Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request.
 
-* **`Diarization`**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
+> [!NOTE]
+> 
+> Content Understanding supports the full set of [Azure AI Speech Speech to text languages](../../speech-service/language-support.md).
+> For languages with fast transcriptions support and for files ≤ 300 MB and/or ≤ 2 hours, transcription time is reduced substantially.
+
+* **Diarization**. Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers.
 
 * **Speaker role detection**. Identifies agent and customer roles within contact center call data.
 
-* **Language detection**. Automatically detects the language in the audio or uses specified language/locale hints.
+* **Multilingual transcription**. Generates multilingual transcripts, applying language/locale per phrase. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
+
+> [!NOTE]
+> 
+> The following locales are currently supported:
+> * **Files ≤ 300 MB and/or ≤ 2 hours**: de-DE, en-AU, en-CA, en-GB, en-IN, en-US, es-ES, es-MX, fr-CA, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, and zh-CN.
+> * **Files larger than 300 MB and/or longer than 4 hours**: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, zh-CN.
+
+* **Language detection**. Automatically detects the dominant language/locale which is used to transcribe the file. Set multiple languages/locales to enable language detection.
+
+> [!NOTE]
+> 
+> For files larger than 300 MB and/or longer than 2 hours and locales unsupported by Fast transcription, the file is processed generating a multilingual transcript based on the specified locales.
+> In case language detection fails, the first language/locale defined is used to transcribe the file.
 
 ### Field extraction
 
@@ -59,15 +76,197 @@ Content Understanding offers advanced audio capabilities, including:
 
 * **Scenario adaptability**. Adapt the service to your requirements by generating custom fields and extract relevant data.
 
-## Content Understanding audio analyzer templates
-
-Content Understanding offers customizable audio analyzer templates:
-
-* **Post-call analysis**. Analyze call recordings to generate conversation transcripts, call summaries, sentiment assessments, and more.
-
-* **Conversation analysis**. Generate transcriptions, summaries, and sentiment assessments from conversation audio recordings.
-
-Start with a template or create a custom analyzer to meet your specific business needs.
+## Content Understanding prebuilt audio analyzers
+
+The prebuilt analyzers allow extracting valuable insights into audio content without the need to create an analyzer setup.
+
+All audio analyzers generate transcripts in standard WEBVTT format separated by speaker.
+
+> [!NOTE]
+> 
+> Prebuilt analyzers are set to use multilingual transcription and `returnDetails` enabled.
+
+Content Understanding offers the following prebuilt analyzers:
+
+**Post-call analysis (prebuilt-callCenter)**. Analyze call recordings to generate:
+
+* conversation transcripts with speaker role detection result
+* call summary
+* call sentiment
+* top five articles mentioned
+* list of companies mentioned
+* list of people (name and title/role) mentioned
+* list of relevant call categories
+
+**Example result:**
+```json
+{
+  "id": "bc36da27-004f-475e-b808-8b8aead3b566",
+  "status": "Succeeded",
+  "result": {
+    "analyzerId": "prebuilt-callCenter",
+    "apiVersion": "2025-05-01-preview",
+    "createdAt": "2025-05-06T22:53:28Z",
+    "stringEncoding": "utf8",
+    "warnings": [],
+    "contents": [
+      {
+        "markdown": "# Audio: 00:00.000 => 00:32.183\n\nTranscript\n```\nWEBVTT\n\n00:00.080 --> 00:00.640\n<v Agent>Good day.\n\n00:00.960 --> 00:02.240\n<v Agent>Welcome to Contoso.\n\n00:02.560 --> 00:03.760\n<v Agent>My name is John Doe.\n\n00:03.920 --> 00:05.120\n<v Agent>How can I help you today?\n\n00:05.440 --> 00:06.320\n<v Agent>Yes, good day.\n\n00:06.720 --> 00:08.160\n<v Agent>My name is Maria Smith.\n\n00:08.560 --> 00:11.280\n<v Agent>I would like to inquire about my current point balance.\n\n00:11.680 --> 00:12.560\n<v Agent>No problem.\n\n00:12.880 --> 00:13.920\n<v Agent>I am happy to help.\n\n00:14.240 --> 00:16.720\n<v Agent>I need your date of birth to confirm your identity.\n\n00:17.120 --> 00:19.600\n<v Agent>It is April 19th, 1988.\n\n00:20.000 --> 00:20.480\n<v Agent>Great.\n\n00:20.800 --> 00:24.160\n<v Agent>Your current point balance is 599 points.\n\n00:24.560 --> 00:26.160\n<v Agent>Do you need any more information?\n\n00:26.480 --> 00:27.200\n<v Agent>No, thank you.\n\n00:27.600 --> 00:28.320\n<v Agent>That was all.\n\n00:28.720 --> 00:29.280\n<v Agent>Goodbye.\n\n00:29.680 --> 00:30.320\n<v Agent>You're welcome.\n\n00:30.640 --> 00:31.840\n<v Agent>Goodbye at Contoso.\n```",
+        "fields": {
+          "Summary": {
+            "type": "string",
+            "valueString": "Maria Smith contacted Contoso to inquire about her current point balance. After confirming her identity with her date of birth, the agent, John Doe, informed her that her balance was 599 points. Maria did not require any further assistance, and the call concluded politely."
+          },
+          "Topics": {
+            "type": "array",
+            "valueArray": [
+              {
+                "type": "string",
+                "valueString": "Point balance inquiry"
+              },
+              {
+                "type": "string",
+                "valueString": "Identity confirmation"
+              },
+              {
+                "type": "string",
+                "valueString": "Customer service"
+              }
+            ]
+          },
+          "Companies": {
+            "type": "array",
+            "valueArray": [
+              {
+                "type": "string",
+                "valueString": "Contoso"
+              }
+            ]
+          },
+          "People": {
+            "type": "array",
+            "valueArray": [
+              {
+                "type": "object",
+                "valueObject": {
+                  "Name": {
+                    "type": "string",
+                    "valueString": "John Doe"
+                  },
+                  "Role": {
+                    "type": "string",
+                    "valueString": "Agent"
+                  }
+                }
+              },
+              {
+                "type": "object",
+                "valueObject": {
+                  "Name": {
+                    "type": "string",
+                    "valueString": "Maria Smith"
+                  },
+                  "Role": {
+                    "type": "string",
+                    "valueString": "Customer"
+                  }
+                }
+              }
+            ]
+          },
+          "Sentiment": {
+            "type": "string",
+            "valueString": "Positive"
+          },
+          "Categories": {
+            "type": "array",
+            "valueArray": [
+              {
+                "type": "string",
+                "valueString": "Business"
+              }
+            ]
+          }
+        },
+        "kind": "audioVisual",
+        "startTimeMs": 0,
+        "endTimeMs": 32183,
+        "transcriptPhrases": [
+          {
+            "speaker": "Agent",
+            "startTimeMs": 80,
+            "endTimeMs": 640,
+            "text": "Good day.",
+            "words": []
+          }, ...
+          {
+            "speaker": "Customer",
+            "startTimeMs": 5440,
+            "endTimeMs": 6320,
+            "text": "Yes, good day.",
+            "words": []
+          }, ...
+        ]
+      }
+    ]
+  }
+}
+```
+
+**Conversation analysis (prebuilt-audioAnalyzer)**. Analyze recordings to generate:
+- conversation transcripts
+- conversation summary
+
+**Example result:**
+```json
+{
+  "id": "9624cc49-b6b3-4ce5-be6c-e895d8c2484d",
+  "status": "Succeeded",
+  "result": {
+    "analyzerId": "prebuilt-audioAnalyzer",
+    "apiVersion": "2025-05-01-preview",
+    "createdAt": "2025-05-06T23:00:12Z",
+    "stringEncoding": "utf8",
+    "warnings": [],
+    "contents": [
+      {
+        "markdown": "# Audio: 00:00.000 => 00:32.183\n\nTranscript\n```\nWEBVTT\n\n00:00.080 --> 00:00.640\n<v Speaker 1>Good day.\n\n00:00.960 --> 00:02.240\n<v Speaker 1>Welcome to Contoso.\n\n00:02.560 --> 00:03.760\n<v Speaker 1>My name is John Doe.\n\n00:03.920 --> 00:05.120\n<v Speaker 1>How can I help you today?\n\n00:05.440 --> 00:06.320\n<v Speaker 1>Yes, good day.\n\n00:06.720 --> 00:08.160\n<v Speaker 1>My name is Maria Smith.\n\n00:08.560 --> 00:11.280\n<v Speaker 1>I would like to inquire about my current point balance.\n\n00:11.680 --> 00:12.560\n<v Speaker 1>No problem.\n\n00:12.880 --> 00:13.920\n<v Speaker 1>I am happy to help.\n\n00:14.240 --> 00:16.720\n<v Speaker 1>I need your date of birth to confirm your identity.\n\n00:17.120 --> 00:19.600\n<v Speaker 1>It is April 19th, 1988.\n\n00:20.000 --> 00:20.480\n<v Speaker 1>Great.\n\n00:20.800 --> 00:24.160\n<v Speaker 1>Your current point balance is 599 points.\n\n00:24.560 --> 00:26.160\n<v Speaker 1>Do you need any more information?\n\n00:26.480 --> 00:27.200\n<v Speaker 1>No, thank you.\n\n00:27.600 --> 00:28.320\n<v Speaker 1>That was all.\n\n00:28.720 --> 00:29.280\n<v Speaker 1>Goodbye.\n\n00:29.680 --> 00:30.320\n<v Speaker 1>You're welcome.\n\n00:30.640 --> 00:31.840\n<v Speaker 1>Goodbye at Contoso.\n```",
+        "fields": {
+          "Summary": {
+            "type": "string",
+            "valueString": "Maria Smith contacted Contoso to inquire about her current point balance. John Doe assisted her by confirming her identity using her date of birth and informed her that her balance was 599 points. Maria expressed no further inquiries, and the conversation concluded politely."
+          }
+        },
+        "kind": "audioVisual",
+        "startTimeMs": 0,
+        "endTimeMs": 32183,
+        "transcriptPhrases": [
+          {
+            "speaker": "Speaker 1",
+            "startTimeMs": 80,
+            "endTimeMs": 640,
+            "text": "Good day.",
+            "words": []
+          }, ...
+          {
+            "speaker": "Speaker 2",
+            "startTimeMs": 5440,
+            "endTimeMs": 6320,
+            "text": "Yes, good day.",
+            "words": []
+          }, ...
+        ]
+      }
+    ]
+  }
+}
+```
+
+You can also customize prebuilt analyzers for more fine-grained control of the output by defining custom fields. Customization allows you to use the full power of generative models to extract deep insights from the audio. For example, customization allows you to:
+
+* Generate other insights.
+* Control the language of the field extraction output.
+* Configure the transcription behavior.
 
 ## Input requirements
 For a detailed list of supported audio formats, refer to our [Service limits and codecs](../service-limits.md) page.
diff --git a/articles/ai-services/content-understanding/concepts/accuracy-confidence.md b/articles/ai-services/content-understanding/concepts/accuracy-confidence.md
@@ -7,7 +7,7 @@ ms.author: admaheshwari
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: reference
-ms.date: 04/09/2025
+ms.date: 05/19/2025
 ---
 
 # Interpret and improve confidence and accuracy scores
diff --git a/articles/ai-services/content-understanding/concepts/analyzer-templates.md b/articles/ai-services/content-understanding/concepts/analyzer-templates.md
@@ -7,7 +7,7 @@ ms.author: kabrow
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 05/01/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release
 ---
 
diff --git a/articles/ai-services/content-understanding/concepts/best-practices.md b/articles/ai-services/content-understanding/concepts/best-practices.md
@@ -7,7 +7,7 @@ ms.author: jfilcik
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 02/24/2025
+ms.date: 05/19/2025
 ---
 
 # Best practices for Content Understanding
diff --git a/articles/ai-services/content-understanding/concepts/capabilities.md b/articles/ai-services/content-understanding/concepts/capabilities.md
@@ -7,7 +7,7 @@ ms.author: lajanuar
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 02/25/2025
+ms.date: 05/19/2025
 ms.custom: 2025-understanding-release
 ---
 
diff --git a/articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md b/articles/ai-services/content-understanding/concepts/retrieval-augmented-generation.md
@@ -7,7 +7,7 @@ ms.author: tonyeiyalla
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 03/16/2025
+ms.date: 05/19/2025
 ms.custom: 2025-understanding-release
 ---
 # Multimodal retrieval-augmented generation with Content Understanding
diff --git a/articles/ai-services/content-understanding/document/overview.md b/articles/ai-services/content-understanding/document/overview.md
@@ -7,7 +7,7 @@ ms.author: lajanuar
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 05/01/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release
 ---
 
diff --git a/articles/ai-services/content-understanding/faq.yml b/articles/ai-services/content-understanding/faq.yml
@@ -6,7 +6,7 @@ metadata:
   manager: nitinme
   ms.service: azure-ai-content-understanding
   ms.topic: faq
-  ms.date: 04/14/2025
+  ms.date: 05/19/2025
   ms.author: lajanuar
 title: Frequently asked questions
 summary: |
diff --git a/articles/ai-services/content-understanding/glossary.md b/articles/ai-services/content-understanding/glossary.md
@@ -6,7 +6,7 @@ author: laujan
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: conceptual
-ms.date: 04/14/2025
+ms.date: 05/19/2025
 ms.author: lajanuar
 ---
 
diff --git a/articles/ai-services/content-understanding/how-to/create-multi-service-resource.md b/articles/ai-services/content-understanding/how-to/create-multi-service-resource.md
@@ -6,7 +6,7 @@ author: laujan
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: how-to
-ms.date: 02/19/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release, references_regions
 ms.author: lajanuar
 ---
diff --git a/articles/ai-services/content-understanding/image/overview.md b/articles/ai-services/content-understanding/image/overview.md
@@ -7,7 +7,7 @@ ms.author: lajanuar
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: how-to
-ms.date: 04/14/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release
 ---
 
diff --git a/articles/ai-services/content-understanding/index.yml b/articles/ai-services/content-understanding/index.yml
@@ -9,7 +9,7 @@ metadata:
   ms.topic: landing-page
   author: laujan
   ms.author: lajanuar
-  ms.date: 04/14/2025
+  ms.date: 05/19/2025
 
 landingContent:
 
diff --git a/articles/ai-services/content-understanding/language-region-support.md b/articles/ai-services/content-understanding/language-region-support.md
@@ -8,7 +8,7 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: reference
 ms.custom: references_regions, ignite-2024-understanding-release
-ms.date: 02/28/2025
+ms.date: 05/19/2025
 ---
 
 # Content Understanding region and language support
diff --git a/articles/ai-services/content-understanding/overview.md b/articles/ai-services/content-understanding/overview.md
@@ -7,7 +7,7 @@ ms.author: lajanuar
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
-ms.date: 03/06/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release
 
 #customer intent: As a user, I want to learn more about Content Understanding solutions.
diff --git a/articles/ai-services/content-understanding/quickstart/use-ai-foundry.md b/articles/ai-services/content-understanding/quickstart/use-ai-foundry.md
@@ -6,7 +6,7 @@ author: laujan
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: quickstart
-ms.date: 03/12/2025
+ms.date: 05/19/2025
 ms.custom: ignite-2024-understanding-release
 ---
 
diff --git a/articles/ai-services/content-understanding/quickstart/use-rest-api.md b/articles/ai-services/content-understanding/quickstart/use-rest-api.md
@@ -7,7 +7,7 @@ ms.author: tonyeiyalla
 manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: quickstart
-ms.date: 04/14/2025
+ms.date: 05/19/2025
 ---
 
 # Quickstart: Azure AI Content Understanding REST APIs
diff --git a/articles/ai-services/content-understanding/service-limits.md b/articles/ai-services/content-understanding/service-limits.md
diff --git a/articles/ai-services/content-understanding/tutorial/build-rag-solution.md b/articles/ai-services/content-understanding/tutorial/build-rag-solution.md
diff --git a/articles/ai-services/content-understanding/video/overview.md b/articles/ai-services/content-understanding/video/overview.md
diff --git a/articles/ai-services/content-understanding/whats-new.md b/articles/ai-services/content-understanding/whats-new.md