Skip to content

Commit cb750a9

Browse files
Merge pull request #314 from eric-urban/eur/speech-containers
refresh release notes
2 parents ec85c8e + a5e17f7 commit cb750a9

File tree

9 files changed

+78
-47
lines changed

9 files changed

+78
-47
lines changed

articles/ai-services/speech-service/fast-transcription-create.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 7/12/2024
10+
ms.date: 9/17/2024
1111
# Customer intent: As a user who implements audio transcription, I want create transcriptions as quickly as possible.
1212
---
1313

@@ -40,7 +40,7 @@ Construct the request body according to the following instructions:
4040
- Set the required `locales` property. This value should match the expected locale of the audio data to transcribe. The supported locales are: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, and zh-CN. You can only specify one locale per transcription request.
4141
- Optionally, set the `profanityFilterMode` property to specify how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. The `profanityFilterMode` property works the same way as via the [batch transcription API](./batch-transcription.md).
4242
- Optionally, set the `channels` property to specify the zero-based indices of the channels to be transcribed separately. If not specified, multiple channels are merged and transcribed jointly. Only up to two channels are supported. If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]` here. Otherwise, stereo audio will be merged to mono, mono audio will be left as is, and only a single channel will be transcribed. In either of the latter cases, the output has no channel indices for the transcribed text, since only a single audio stream is transcribed.
43-
- Optionally, set the `diarizationSettings` to recognize and separate multiple speakers on mono channel audio file. You need to specify the minimum and maximum number of people who might be speaking in the audio file (for example, specify `"diarizationSettings": {"minSpeakers": 1, "maxSpeakers": 4}`). Then the transcription file will contain a `speaker` entry for each transcribed phrase. The feature isn't available with stereo audio when you set the `channels` property as `[0,1]`.
43+
- Optionally, set the `diarizationSettings` property to recognize and separate multiple speakers on mono channel audio file. You need to specify the minimum and maximum number of people who might be speaking in the audio file (for example, specify `"diarizationSettings": {"minSpeakers": 1, "maxSpeakers": 4}`). Then the transcription file will contain a `speaker` entry for each transcribed phrase. The feature isn't available with stereo audio when you set the `channels` property as `[0,1]`.
4444

4545
Make a multipart/form-data POST request to the `transcriptions` endpoint with the audio file and the request body properties. The following example shows how to create a transcription using the fast transcription API.
4646

articles/ai-services/speech-service/includes/release-notes/release-notes-containers.md

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,36 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 8/7/2024
5+
ms.date: 9/17/2024
66
ms.author: eur
77
---
88

9+
### 2024-September release
10+
11+
Add support for the latest model versions:
12+
- Speech language identification 1.15.0
13+
- Mitigate Vulnerabilities
14+
- Neural text to speech 3.4.0
15+
- New voices: `en-us-andrewmultilingualneural`, `en-us-jessaneural`, `es-us-alonsoneural`, `es-us-palomaneural`, `it-it-isabellamultilingualneural`
16+
- Mitigate Vulnerabilities
17+
- Speech to text 4.9.0
18+
- New Locales: `ar-YE`, `af-ZA`, `am-ET`, `ar-MA`, `ar-TN`, `sw-KE`, `sw-TZ`, `zu-ZA`
19+
- Mitigate Vulnerabilities
20+
- Update Deprecated Models
21+
- Custom speech to text 4.9.0
22+
- Mitigate Vulnerabilities
23+
924
### 2024-August release
1025

1126
Add support for the latest model versions:
1227
- Speech language identification 1.14.0
13-
- Upgrade .Net 8.0
28+
- Upgrade .NET 8.0
1429
- Mitigate Vulnerabilities
1530
- Neural text to speech 3.3.0
16-
- Upgrade .Net 8.0
31+
- Upgrade .NET 8.0
1732
- Mitigate Vulnerabilities
18-
- Speech to text 4.18.0
19-
- Upgrade .Net 8.0
33+
- Speech to text 4.8.0
34+
- Upgrade .NET 8.0
2035
- Mitigate Vulnerabilities
2136
- Upgrade Recognition Engine
2237
- Fix the issue where `PropertyId.Speech_SegmentationSilenceTimeoutMs` was being ignored.
@@ -80,7 +95,7 @@ Add support for the latest model versions:
8095

8196
Fix the issue of running speech to text container via `docker` mount options with local custom model files.
8297

83-
Fix the issue that in some cases the `RECOGNIZING` event does not show up in response through the Speech SDK.
98+
Fix the issue that in some cases the `RECOGNIZING` event doesn't show up in response through the Speech SDK.
8499

85100
Fix vulnerability issues.
86101

@@ -241,7 +256,7 @@ Regular monthly updates including security upgrades and vulnerability fixes.
241256

242257
Regular monthly updates including security upgrades and vulnerability fixes.
243258

244-
#### Neural Neural text to speech v2.5.0
259+
#### Neural text to speech v2.5.0
245260

246261
Add support for these [prebuilt neural voices](../../language-support.md?tabs=tts):
247262
* `az-az-babekneural`
@@ -279,7 +294,7 @@ Add support for using containers in [disconnected environments](../../../contain
279294
Regular monthly updates including security upgrades and vulnerability fixes.
280295

281296
#### Neural-Neural text to speech Container v1.12.0
282-
Add support for these prebuilt neural voices: `am-et-amehaneural`, `am-et-mekdesneural`, `so-so-muuseneural` and `so-so-ubaxneural`.
297+
Add support for these prebuilt neural voices: `am-et-amehaneural`, `am-et-mekdesneural`, `so-so-muuseneural`, and `so-so-ubaxneural`.
283298

284299
Regular monthly updates including security upgrades and vulnerability fixes.
285300

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.author: eur
99
### September 2024 release
1010

1111
#### Fast transcription (Preview)
12-
Fast transcription now supports Diarization to recognize and separate multiple speakers on mono channel audio file. For more information, see [fast transcription API guide](../../fast-transcription-create.md#use-the-fast-transcription-api).
12+
Fast transcription now supports diarization to recognize and separate multiple speakers on mono channel audio file. For more information, see [fast transcription API guide](../../fast-transcription-create.md#use-the-fast-transcription-api).
1313

1414
### August 2024 release
1515

@@ -62,7 +62,7 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
6262

6363
#### Fast Transcription API (Preview)
6464

65-
Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe a 30-minutes audio in less than 1 minute. For more information, see the [fast transcription API guide](../../fast-transcription-create.md).
65+
Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe audio much faster than the actual audio length. For more information, see the [fast transcription API guide](../../fast-transcription-create.md).
6666

6767
> [!TIP]
6868
> Try out fast transcription in [Azure AI Studio](https://aka.ms/fasttranscription/studio).

articles/ai-services/speech-service/releasenotes.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: release-notes
10-
ms.date: 6/6/2024
10+
ms.date: 9/17/2024
1111
ms.custom: references_regions
1212
---
1313

@@ -17,6 +17,7 @@ Azure AI Speech is updated on an ongoing basis. To stay up-to-date with recent d
1717

1818
## Recent highlights
1919

20+
* Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, and supports diarization to recognize and separate multiple speakers on mono channel audio. It can transcribe audio much faster than the actual audio length. For more information, see the [fast transcription API guide](fast-transcription-create.md).
2021
* Video translation is now available in the Azure AI Speech service. For more information, see [What is video translation?](./video-translation-overview.md).
2122
* Personal voice is now generally available. For more information, see [What is personal voice?](./personal-voice-overview.md).
2223
* The Azure AI Speech service supports OpenAI text to speech voices. For more information, see [What are OpenAI text to speech voices?](./openai-voices.md).

articles/ai-services/speech-service/speech-container-cstt.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: how-to
10-
ms.date: 1/21/2024
10+
ms.date: 9/17/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-sdk-cli
1313
keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
3030
| Version | Path |
3131
|-----------|------------|
3232
| Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:latest` |
33-
| 4.6.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:4.6.0-amd64` |
33+
| 4.9.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:4.9.0-amd64` |
3434

3535
All tags, except for `latest`, are in the following format and are case sensitive:
3636

@@ -47,11 +47,13 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
4747
{
4848
"name": "azure-cognitive-services/speechservices/custom-speech-to-text",
4949
"tags": [
50-
"2.10.0-amd64",
51-
"2.11.0-amd64",
52-
"2.12.0-amd64",
53-
"2.12.1-amd64",
5450
<--redacted for brevity-->
51+
"4.4.0-amd64",
52+
"4.5.0-amd64",
53+
"4.6.0-amd64",
54+
"4.7.0-amd64",
55+
"4.8.0-amd64",
56+
"4.9.0-amd64",
5557
"latest"
5658
]
5759
}

articles/ai-services/speech-service/speech-container-lid.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: how-to
10-
ms.date: 1/22/2024
10+
ms.date: 9/17/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-sdk-cli
1313
keywords: on-premises, Docker, container
@@ -36,7 +36,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
3636
| Version | Path |
3737
|-----------|------------|
3838
| Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:latest` |
39-
| 1.12.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:1.12.0-amd64-preview` |
39+
| 1.15.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/language-detection:1.15.0-amd64-preview` |
4040

4141
All tags, except for `latest`, are in the following format and are case sensitive:
4242

@@ -53,9 +53,13 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
5353
"1.1.0-amd64-preview",
5454
"1.11.0-amd64-preview",
5555
"1.12.0-amd64-preview",
56+
"1.13.0-amd64-preview",
57+
"1.14.0-amd64-preview",
58+
"1.15.0-amd64-preview",
5659
"1.3.0-amd64-preview",
5760
"1.5.0-amd64-preview",
58-
<--redacted for brevity-->
61+
"1.6.1-amd64-preview",
62+
"1.7.0-amd64-preview",
5963
"1.8.0-amd64-preview",
6064
"latest"
6165
]

articles/ai-services/speech-service/speech-container-ntts.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: how-to
10-
ms.date: 1/22/2024
10+
ms.date: 9/17/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-sdk-cli
1313
keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
3030
| Version | Path |
3131
|-----------|------------|
3232
| Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:latest`<br/><br/>The `latest` tag pulls the `en-US` locale and `en-us-arianeural` voice. |
33-
| 3.1.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:3.1.0-amd64-en-us-arianeural` |
33+
| 3.4.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:3.4.0-amd64-en-us-arianeural` |
3434

3535
All tags, except for `latest`, are in the following format and are case sensitive:
3636

@@ -45,17 +45,19 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
4545
"name": "azure-cognitive-services/speechservices/neural-text-to-speech",
4646
"tags": [
4747
<--redacted for brevity-->
48-
"3.1.0-amd64-en-us-arianeural",
49-
"3.1.0-amd64-en-us-guyneural",
50-
"3.1.0-amd64-en-us-jennymultilingualneural",
51-
"3.1.0-amd64-en-us-jennyneural",
52-
"3.1.0-amd64-en-us-michelleneural",
53-
"3.1.0-amd64-es-es-alvaroneural",
54-
"3.1.0-amd64-es-es-elviraneural",
55-
"3.1.0-amd64-es-mx-candelaneural",
56-
"3.1.0-amd64-es-mx-dalianeural",
57-
"3.1.0-amd64-es-mx-jorgeneural",
58-
<--redacted for brevity-->
48+
"3.4.0-amd64-uk-ua-ostapneural",
49+
"3.4.0-amd64-zh-cn-xiaochenneural-preview",
50+
"3.4.0-amd64-zh-cn-xiaohanneural",
51+
"3.4.0-amd64-zh-cn-xiaomoneural",
52+
"3.4.0-amd64-zh-cn-xiaoqiuneural-preview",
53+
"3.4.0-amd64-zh-cn-xiaoruineural",
54+
"3.4.0-amd64-zh-cn-xiaoshuangneural-preview",
55+
"3.4.0-amd64-zh-cn-xiaoxiaoneural",
56+
"3.4.0-amd64-zh-cn-xiaoyanneural-preview",
57+
"3.4.0-amd64-zh-cn-xiaoyouneural",
58+
"3.4.0-amd64-zh-cn-yunxineural",
59+
"3.4.0-amd64-zh-cn-yunyangneural",
60+
"3.4.0-amd64-zh-cn-yunyeneural",
5961
"latest"
6062
]
6163
}

articles/ai-services/speech-service/speech-container-overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 8/7/2024
9+
ms.date: 9/17/2024
1010
ms.author: eur
1111
keywords: on-premises, Docker, container
1212
---
@@ -21,10 +21,10 @@ The following table lists the Speech containers available in the Microsoft Conta
2121

2222
| Container | Features | Supported versions and locales |
2323
|--|--|--|
24-
| [Speech to text](speech-container-stt.md) | Transcribes continuous real-time speech or batch audio recordings with intermediate results. | Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).|
24+
| [Speech to text](speech-container-stt.md) | Transcribes continuous real-time speech or batch audio recordings with intermediate results. | Latest: 4.9.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list).|
2525
| [Custom speech to text](speech-container-cstt.md) | Using a custom model from the [custom speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | Latest: 4.8.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/custom-speech-to-text/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/speech-to-text/tags/list). |
26-
| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> | Detects the language spoken in audio files. | Latest: 1.14.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). |
27-
| [Neural text to speech](speech-container-ntts.md) | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | Latest: 3.3.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/neural-text-to-speech/tags/list). |
26+
| [Speech language identification](speech-container-lid.md)<sup>1, 2</sup> | Detects the language spoken in audio files. | Latest: 1.15.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/language-detection/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/language-detection/tags/list). |
27+
| [Neural text to speech](speech-container-ntts.md) | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | Latest: 3.4.0<br/><br/>For all supported versions and locales, see the [Microsoft Container Registry (MCR)](https://mcr.microsoft.com/product/azure-cognitive-services/speechservices/neural-text-to-speech/tags) and [JSON tags](https://mcr.microsoft.com/v2/azure-cognitive-services/speechservices/neural-text-to-speech/tags/list). |
2828

2929
<sup>1</sup> The container is available in public preview. Containers in preview are still under development and don't meet Microsoft's stability and support requirements.
3030
<sup>2</sup> Not available as a disconnected container.

articles/ai-services/speech-service/speech-container-stt.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: how-to
10-
ms.date: 1/22/2024
10+
ms.date: 9/17/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-sdk-cli
1313
keywords: on-premises, Docker, container
@@ -30,7 +30,7 @@ The fully qualified container image name is, `mcr.microsoft.com/azure-cognitive-
3030
| Version | Path |
3131
|-----------|------------|
3232
| Latest | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:latest`<br/><br/>The `latest` tag pulls the latest image for the `en-US` locale. |
33-
| 4.6.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:4.6.0-amd64-mr-in` |
33+
| 4.9.0 | `mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text:4.9.0-amd64-mr-in` |
3434

3535
All tags, except for `latest`, are in the following format and are case sensitive:
3636

@@ -44,12 +44,19 @@ The tags are also available [in JSON format](https://mcr.microsoft.com/v2/azure-
4444
{
4545
"name": "azure-cognitive-services/speechservices/speech-to-text",
4646
"tags": [
47-
"2.10.0-amd64-ar-ae",
48-
"2.10.0-amd64-ar-bh",
49-
"2.10.0-amd64-ar-eg",
50-
"2.10.0-amd64-ar-iq",
51-
"2.10.0-amd64-ar-jo",
52-
<--redacted for brevity-->
47+
<--redacted for brevity-->
48+
"4.9.0-amd64-sw-tz",
49+
"4.9.0-amd64-ta-in",
50+
"4.9.0-amd64-th-th",
51+
"4.9.0-amd64-tr-tr",
52+
"4.9.0-amd64-vi-vn",
53+
"4.9.0-amd64-wuu-cn",
54+
"4.9.0-amd64-yue-cn",
55+
"4.9.0-amd64-zh-cn",
56+
"4.9.0-amd64-zh-cn-sichuan",
57+
"4.9.0-amd64-zh-hk",
58+
"4.9.0-amd64-zh-tw",
59+
"4.9.0-amd64-zu-za",
5360
"latest"
5461
]
5562
}

0 commit comments

Comments
 (0)