Merge pull request #314 from eric-urban/eur/speech-containers

prmerger-automator[bot] · web-flow · commit cb750a9dbd4d · 2024-09-17T17:57:27.000Z
refresh release notes
diff --git a/articles/ai-services/speech-service/fast-transcription-create.md b/articles/ai-services/speech-service/fast-transcription-create.md
@@ -7,7 +7,7 @@ author: eric-urban
 ms.author: eur
 ms.service: azure-ai-speech
 ms.topic: how-to
-ms.date: 7/12/2024
+ms.date: 9/17/2024
 # Customer intent: As a user who implements audio transcription, I want create transcriptions as quickly as possible.
 ---
 
@@ -40,7 +40,7 @@ Construct the request body according to the following instructions:
 - Set the required `locales` property. This value should match the expected locale of the audio data to transcribe. The supported locales are: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, and zh-CN. You can only specify one locale per transcription request.
 - Optionally, set the `profanityFilterMode` property to specify how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. The `profanityFilterMode` property works the same way as via the [batch transcription API](./batch-transcription.md).
 - Optionally, set the `channels` property to specify the zero-based indices of the channels to be transcribed separately. If not specified, multiple channels are merged and transcribed jointly. Only up to two channels are supported. If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]` here. Otherwise, stereo audio will be merged to mono, mono audio will be left as is, and only a single channel will be transcribed. In either of the latter cases, the output has no channel indices for the transcribed text, since only a single audio stream is transcribed.
-- Optionally, set the `diarizationSettings` to recognize and separate multiple speakers on mono channel audio file. You need to specify the minimum and maximum number of people who might be speaking in the audio file (for example, specify `"diarizationSettings": {"minSpeakers": 1, "maxSpeakers": 4}`). Then the transcription file will contain a `speaker` entry for each transcribed phrase. The feature isn't available with stereo audio when you set the `channels` property as `[0,1]`.
+- Optionally, set the `diarizationSettings` property to recognize and separate multiple speakers on mono channel audio file. You need to specify the minimum and maximum number of people who might be speaking in the audio file (for example, specify `"diarizationSettings": {"minSpeakers": 1, "maxSpeakers": 4}`). Then the transcription file will contain a `speaker` entry for each transcribed phrase. The feature isn't available with stereo audio when you set the `channels` property as `[0,1]`.
 
 Make a multipart/form-data POST request to the `transcriptions` endpoint with the audio file and the request body properties. The following example shows how to create a transcription using the fast transcription API.
 
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-containers.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-containers.md
@@ -2,21 +2,36 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 8/7/2024
+ms.date: 9/17/2024
 ms.author: eur
 ---
 
+### 2024-September release
+
+Add support for the latest model versions:
+- Speech language identification 1.15.0
+    - Mitigate Vulnerabilities
+- Neural text to speech 3.4.0
+    -  New voices: `en-us-andrewmultilingualneural`, `en-us-jessaneural`, `es-us-alonsoneural`, `es-us-palomaneural`, `it-it-isabellamultilingualneural`
+    - Mitigate Vulnerabilities
+- Speech to text 4.9.0
+    - New Locales: `ar-YE`, `af-ZA`, `am-ET`, `ar-MA`, `ar-TN`, `sw-KE`, `sw-TZ`, `zu-ZA`
+    - Mitigate Vulnerabilities
+    - Update Deprecated Models
+- Custom speech to text 4.9.0
+    - Mitigate Vulnerabilities
+
 ### 2024-August release
 
 Add support for the latest model versions:
 - Speech language identification 1.14.0
-    - Upgrade .Net 8.0
+    - Upgrade .NET 8.0
     - Mitigate Vulnerabilities
 - Neural text to speech 3.3.0
-    - Upgrade .Net 8.0
+    - Upgrade .NET 8.0
     - Mitigate Vulnerabilities
-- Speech to text 4.18.0    
-    - Upgrade .Net 8.0
+- Speech to text 4.8.0    
+    - Upgrade .NET 8.0
     - Mitigate Vulnerabilities
     - Upgrade Recognition Engine
     - Fix the issue where `PropertyId.Speech_SegmentationSilenceTimeoutMs` was being ignored.
@@ -80,7 +95,7 @@ Add support for the latest model versions:
 
 Fix the issue of running speech to text container via `docker` mount options with local custom model files.
 
-Fix the issue that in some cases the `RECOGNIZING` event does not show up in response through the Speech SDK.
+Fix the issue that in some cases the `RECOGNIZING` event doesn't show up in response through the Speech SDK.
 
 Fix vulnerability issues.
 
@@ -241,7 +256,7 @@ Regular monthly updates including security upgrades and vulnerability fixes.
 
 Regular monthly updates including security upgrades and vulnerability fixes.
 
-#### Neural Neural text to speech v2.5.0
+#### Neural text to speech v2.5.0
 
 Add support for these [prebuilt neural voices](../../language-support.md?tabs=tts):
    * `az-az-babekneural`
@@ -279,7 +294,7 @@ Add support for using containers in [disconnected environments](../../../contain
 Regular monthly updates including security upgrades and vulnerability fixes.
 
 #### Neural-Neural text to speech Container v1.12.0
-Add support for these prebuilt neural voices: `am-et-amehaneural`, `am-et-mekdesneural`, `so-so-muuseneural` and `so-so-ubaxneural`.
+Add support for these prebuilt neural voices: `am-et-amehaneural`, `am-et-mekdesneural`, `so-so-muuseneural`, and `so-so-ubaxneural`.
 
 Regular monthly updates including security upgrades and vulnerability fixes.
 
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
@@ -9,7 +9,7 @@ ms.author: eur
 ### September 2024 release
 
 #### Fast transcription (Preview)
-Fast transcription now supports Diarization to recognize and separate multiple speakers on mono channel audio file. For more information, see [fast transcription API guide](../../fast-transcription-create.md#use-the-fast-transcription-api).
+Fast transcription now supports diarization to recognize and separate multiple speakers on mono channel audio file. For more information, see [fast transcription API guide](../../fast-transcription-create.md#use-the-fast-transcription-api).
 
 ### August 2024 release
 
@@ -62,7 +62,7 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
 
 #### Fast Transcription API (Preview)
 
-Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe a 30-minutes audio in less than 1 minute. For more information, see the [fast transcription API guide](../../fast-transcription-create.md).
+Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe audio much faster than the actual audio length. For more information, see the [fast transcription API guide](../../fast-transcription-create.md).
 
 > [!TIP]
 > Try out fast transcription in [Azure AI Studio](https://aka.ms/fasttranscription/studio).
diff --git a/articles/ai-services/speech-service/releasenotes.md b/articles/ai-services/speech-service/releasenotes.md
@@ -7,7 +7,7 @@ author: eric-urban
 ms.author: eur
 ms.service: azure-ai-speech
 ms.topic: release-notes
-ms.date: 6/6/2024
+ms.date: 9/17/2024
 ms.custom: references_regions
 ---
 
@@ -17,6 +17,7 @@ Azure AI Speech is updated on an ongoing basis. To stay up-to-date with recent d
 
 ## Recent highlights
 
+* Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, and supports diarization to recognize and separate multiple speakers on mono channel audio. It can transcribe audio much faster than the actual audio length. For more information, see the [fast transcription API guide](fast-transcription-create.md).
 * Video translation is now available in the Azure AI Speech service. For more information, see [What is video translation?](./video-translation-overview.md).
 * Personal voice is now generally available. For more information, see [What is personal voice?](./personal-voice-overview.md).
 * The Azure AI Speech service supports OpenAI text to speech voices. For more information, see [What are OpenAI text to speech voices?](./openai-voices.md). 
diff --git a/articles/ai-services/speech-service/speech-container-cstt.md b/articles/ai-services/speech-service/speech-container-cstt.md
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: azure-ai-speech
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
 ms.topic: how-to
-ms.date: 1/21/2024
+ms.date: 9/17/2024
 ms.author: eur
 zone_pivot_groups: programming-languages-speech-sdk-cli
 keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
 | Version | Path |
 |-----------|------------|
 | Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:latest` |
-| 4.6.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:4.6.0-amd64` |
+| 4.9.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:4.9.0-amd64` |
 
 All tags, except for `latest`, are in the following format and are case sensitive:
 
@@ -47,11 +47,13 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
 {
   "name": "azure-cognitive-services/speechservices/custom-speech-to-text",
   "tags": [
-    "2.10.0-amd64",
-    "2.11.0-amd64",
-    "2.12.0-amd64",
-    "2.12.1-amd64",
     <--redacted for brevity-->
+    "4.4.0-amd64",
+    "4.5.0-amd64",
+    "4.6.0-amd64",
+    "4.7.0-amd64",
+    "4.8.0-amd64",
+    "4.9.0-amd64",
     "latest"
   ]
 }
diff --git a/articles/ai-services/speech-service/speech-container-lid.md b/articles/ai-services/speech-service/speech-container-lid.md
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: azure-ai-speech
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
 ms.topic: how-to
-ms.date: 1/22/2024
+ms.date: 9/17/2024
 ms.author: eur
 zone_pivot_groups: programming-languages-speech-sdk-cli
 keywords: on-premises, Docker, container
@@ -36,7 +36,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
 | Version | Path |
 |-----------|------------|
 | Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:latest` |
-| 1.12.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:1.12.0-amd64-preview` |
+| 1.15.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:1.15.0-amd64-preview` |
 
 All tags, except for `latest`, are in the following format and are case sensitive:
 
@@ -53,9 +53,13 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
     "1.1.0-amd64-preview",
     "1.11.0-amd64-preview",
     "1.12.0-amd64-preview",
+    "1.13.0-amd64-preview",
+    "1.14.0-amd64-preview",
+    "1.15.0-amd64-preview",
     "1.3.0-amd64-preview",
     "1.5.0-amd64-preview",
-    <--redacted for brevity-->
+    "1.6.1-amd64-preview",
+    "1.7.0-amd64-preview",
     "1.8.0-amd64-preview",
     "latest"
   ]
diff --git a/articles/ai-services/speech-service/speech-container-ntts.md b/articles/ai-services/speech-service/speech-container-ntts.md
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: azure-ai-speech
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
 ms.topic: how-to
-ms.date: 1/22/2024
+ms.date: 9/17/2024
 ms.author: eur
 zone_pivot_groups: programming-languages-speech-sdk-cli
 keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
 | Version | Path |
 |-----------|------------|
 | Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:latest`<br/><br/>The `latest` tag pulls the `en-US` locale and `en-us-arianeural` voice. |
-| 3.1.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:3.1.0-amd64-en-us-arianeural` |
+| 3.4.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:3.4.0-amd64-en-us-arianeural` |
 
 All tags, except for `latest`, are in the following format and are case sensitive:
 
@@ -45,17 +45,19 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
   "name": "azure-cognitive-services/speechservices/neural-text-to-speech",
   "tags": [
     <--redacted for brevity-->
-    "3.1.0-amd64-en-us-arianeural",
-    "3.1.0-amd64-en-us-guyneural",
-    "3.1.0-amd64-en-us-jennymultilingualneural",
-    "3.1.0-amd64-en-us-jennyneural",
-    "3.1.0-amd64-en-us-michelleneural",
-    "3.1.0-amd64-es-es-alvaroneural",
-    "3.1.0-amd64-es-es-elviraneural",
-    "3.1.0-amd64-es-mx-candelaneural",
-    "3.1.0-amd64-es-mx-dalianeural",
-    "3.1.0-amd64-es-mx-jorgeneural",
-    <--redacted for brevity-->
+    "3.4.0-amd64-uk-ua-ostapneural",
+    "3.4.0-amd64-zh-cn-xiaochenneural-preview",
+    "3.4.0-amd64-zh-cn-xiaohanneural",
+    "3.4.0-amd64-zh-cn-xiaomoneural",
+    "3.4.0-amd64-zh-cn-xiaoqiuneural-preview",
+    "3.4.0-amd64-zh-cn-xiaoruineural",
+    "3.4.0-amd64-zh-cn-xiaoshuangneural-preview",
+    "3.4.0-amd64-zh-cn-xiaoxiaoneural",
+    "3.4.0-amd64-zh-cn-xiaoyanneural-preview",
+    "3.4.0-amd64-zh-cn-xiaoyouneural",
+    "3.4.0-amd64-zh-cn-yunxineural",
+    "3.4.0-amd64-zh-cn-yunyangneural",
+    "3.4.0-amd64-zh-cn-yunyeneural",
     "latest"
   ]
 }
diff --git a/articles/ai-services/speech-service/speech-container-overview.md b/articles/ai-services/speech-service/speech-container-overview.md
@@ -6,7 +6,7 @@ author: eric-urban
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: how-to
-ms.date: 8/7/2024
+ms.date: 9/17/2024
 ms.author: eur
 keywords: on-premises, Docker, container
 ---
@@ -21,10 +21,10 @@ The following table lists the Speech containers available in the Microsoft Conta
 
 | Container | Features | Supported versions and locales |
 |--|--|--|
-| [Speech to text](speech-container-stt.md) | Transcribes continuous real-time speech or batch audio recordings with intermediate results.  | Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).|
+| [Speech to text](speech-container-stt.md) | Transcribes continuous real-time speech or batch audio recordings with intermediate results.  | Latest: 4.9.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).|
 | [Custom speech to text](speech-container-cstt.md) | Using a custom model from the [custom speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/custom-speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list). |
-| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> | Detects the language spoken in audio files. | Latest: 1.14.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). |
-| [Neural text to speech](speech-container-ntts.md) | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | Latest: 3.3.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/neural-text-to-speech/tags/list). |
+| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> | Detects the language spoken in audio files. | Latest: 1.15.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). |
+| [Neural text to speech](speech-container-ntts.md) | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | Latest: 3.4.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/neural-text-to-speech/tags/list). |
 
 <sup>1</sup> The container is available in public preview. Containers in preview are still under development and don't meet Microsoft's stability and support requirements.
 <sup>2</sup> Not available as a disconnected container.
diff --git a/articles/ai-services/speech-service/speech-container-stt.md b/articles/ai-services/speech-service/speech-container-stt.md
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: azure-ai-speech
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
 ms.topic: how-to
-ms.date: 1/22/2024
+ms.date: 9/17/2024
 ms.author: eur
 zone_pivot_groups: programming-languages-speech-sdk-cli
 keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
 | Version | Path |
 |-----------|------------|
 | Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:latest`<br/><br/>The `latest` tag pulls the latest image for the `en-US` locale. |
-| 4.6.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:4.6.0-amd64-mr-in` |
+| 4.9.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:4.9.0-amd64-mr-in` |
 
 All tags, except for `latest`, are in the following format and are case sensitive:
 
@@ -44,12 +44,19 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
 {
   "name": "azure-cognitive-services/speechservices/speech-to-text",
   "tags": [
-    "2.10.0-amd64-ar-ae",
-    "2.10.0-amd64-ar-bh",
-    "2.10.0-amd64-ar-eg",
-    "2.10.0-amd64-ar-iq",
-    "2.10.0-amd64-ar-jo",
-    <--redacted for brevity-->
+    <--redacted for brevity-->    
+    "4.9.0-amd64-sw-tz",
+    "4.9.0-amd64-ta-in",
+    "4.9.0-amd64-th-th",
+    "4.9.0-amd64-tr-tr",
+    "4.9.0-amd64-vi-vn",
+    "4.9.0-amd64-wuu-cn",
+    "4.9.0-amd64-yue-cn",
+    "4.9.0-amd64-zh-cn",
+    "4.9.0-amd64-zh-cn-sichuan",
+    "4.9.0-amd64-zh-hk",
+    "4.9.0-amd64-zh-tw",
+    "4.9.0-amd64-zu-za",
     "latest"
   ]
 }