Skip to content

Commit 75a0b4e

Browse files
committed
release notes and updates
1 parent 359abb6 commit 75a0b4e

File tree

5 files changed

+23
-12
lines changed

5 files changed

+23
-12
lines changed

articles/ai-services/speech-service/batch-transcription-audio-data.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,27 @@ Audio files that are stored in Azure Blob storage can be accessed via one of two
2525

2626
You can specify one or multiple audio files when creating a transcription. We recommend that you provide multiple files per request or point to an Azure Blob storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.
2727

28-
## Supported audio formats
28+
## Supported audio formats and codecs
2929

30-
The batch transcription API supports the following formats:
30+
The batch transcription API supports a number of different formats and codecs, such as:
3131

32-
| Format | Codec | Bits per sample | Sample rate |
33-
|--------|-------|---------|---------------------------------|
34-
| WAV | PCM | 16-bit | 8 kHz or 16 kHz, mono or stereo |
35-
| MP3 | PCM | 16-bit | 8 kHz or 16 kHz, mono or stereo |
36-
| OGG | OPUS | 16-bit | 8 kHz or 16 kHz, mono or stereo |
32+
- WAV
33+
- MP3
34+
- OPUS/OGG
35+
- AAC
36+
- FLAC
37+
- WMA
38+
- ALAW in WAV container
39+
- MULAW in WAV container
40+
- AMR
41+
- WebM
42+
- MP4
43+
- M4A
44+
- SPEEX
3745

38-
For stereo audio streams, the left and right channels are split during the transcription. A JSON result file is created for each input audio file. To create an ordered final transcript, use the timestamps that are generated per utterance.
46+
47+
> [!NOTE]
48+
> Batch transcription service integrates GStreamer and may accept more formats and codecs without returning errors, while we suggest to use lossless formats such as WAV (PCM encoding) and FLAC to ensure best transcription quality.
3949
4050
## Azure Blob Storage upload
4151

articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ For more information, see [personal voice](../../personal-voice-overview.md).
1818

1919
Text to speech avatar is available in preview in the following regions: West US 2, West Europe, and Southeast Asia.
2020

21-
Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](#custom-text-to-speech-avatar)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
21+
Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](../../text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
2222

2323
For more information, see [text to speech avatar](../../text-to-speech-avatar/what-is-text-to-speech-avatar.md), [transparency notes](/legal/cognitive-services/speech-service/text-to-speech/transparency-note?context=/azure/ai-services/speech-service/context/context), and [disclosure for voice and avatar talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/ai-services/speech-service/context/context).
2424

articles/ai-services/speech-service/language-support.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ To improve Speech to text recognition accuracy, customization is available for s
4646

4747
The table in this section summarizes the locales and voices supported for Text to speech. See the table footnotes for more details.
4848

49-
Additional remarks for Text to speech locales are included in the [Voice styles and roles](#voice-styles-and-roles), [Prebuilt neural voices](#prebuilt-neural-voices), and [Custom Neural Voice](#custom-neural-voice) sections below.
49+
Additional remarks for text to speech locales are included in the [voice styles and roles](#voice-styles-and-roles), [prebuilt neural voices](#prebuilt-neural-voices), [Custom Neural Voice](#custom-neural-voice), and [personal voice](#personal-voice) sections below.
5050

5151
> [!TIP]
5252
> Check the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) and determine the right voice for your business needs.
@@ -100,6 +100,7 @@ With the cross-lingual feature, you can transfer your custom neural voice model
100100

101101
[!INCLUDE [Language support include](includes/language-support/personal-voice.md)]
102102

103+
103104
# [Pronunciation assessment](#tab/pronunciation-assessment)
104105

105106
The table in this section summarizes the 24 locales supported for pronunciation assessment, and each language is available on all [Speech to text regions](regions.md#speech-service). Latest update extends support from English to 23 additional languages and quality enhancements to existing features, including accuracy, fluency and miscue assessment. You should specify the language that you're learning or practicing improving pronunciation. The default language is set as `en-US`. If you know your target learning language, [set the locale](how-to-pronunciation-assessment.md#get-pronunciation-assessment-results) accordingly. For example, if you're learning British English, you should specify the language as `en-GB`. If you're teaching a broader language, such as Spanish, and are uncertain about which locale to select, you can run various accent models (`es-ES`, `es-MX`) to determine the one that achieves the highest score to suit your specific scenario.

articles/ai-services/speech-service/power-automate-batch-transcription.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ To trigger the test flow, upload an audio file to the Azure Blob Storage contain
127127

128128
## Upload files to the container
129129

130-
Follow these steps to upload [wav, mp3, or ogg](batch-transcription-audio-data.md#supported-audio-formats) files from your local directory to the Azure Storage container that you [created previously](#create-the-azure-blob-storage-container).
130+
Follow these steps to upload [wav, mp3, or ogg](batch-transcription-audio-data.md#supported-audio-formats-and-codecs) files from your local directory to the Azure Storage container that you [created previously](#create-the-azure-blob-storage-container).
131131

132132
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
133133
1. <a href="https://portal.azure.com/#create/Microsoft.StorageAccount-ARM" title="Create a Storage account resource" target="_blank">Create a Storage account resource</a> in the Azure portal. Use the same subscription and resource group as your Speech resource.

articles/ai-services/speech-service/speech-services-quotas-and-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ You can use real-time speech to text with the [Speech SDK](speech-sdk.md) or the
5151
| Max audio input file size | N/A | 1 GB |
5252
| Max number of blobs per container | N/A | 10000 |
5353
| Max number of files per transcription request (when you're using multiple content URLs as input). | N/A | 1000 |
54-
| Max audio length for transcriptions with diarizaion enabled. | N/A | 240 minutes per file |
54+
| Max audio length for transcriptions with diarization enabled. | N/A | 240 minutes per file |
5555

5656
#### Model customization
5757

0 commit comments

Comments
 (0)