release notes and updates

eric-urban · eric-urban · commit 75a0b4edc19b · 2023-11-15T22:29:10.000-08:00
diff --git a/articles/ai-services/speech-service/batch-transcription-audio-data.md b/articles/ai-services/speech-service/batch-transcription-audio-data.md
@@ -25,17 +25,27 @@ Audio files that are stored in Azure Blob storage can be accessed via one of two
 
 You can specify one or multiple audio files when creating a transcription. We recommend that you provide multiple files per request or point to an Azure Blob storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time. 
 
-## Supported audio formats
+## Supported audio formats and codecs
 
-The batch transcription API supports the following formats:
+The batch transcription API supports a number of different formats and codecs, such as:
 
-| Format | Codec | Bits per sample | Sample rate             |
-|--------|-------|---------|---------------------------------|
-| WAV    | PCM   | 16-bit  | 8 kHz or 16 kHz, mono or stereo |
-| MP3    | PCM   | 16-bit  | 8 kHz or 16 kHz, mono or stereo |
-| OGG    | OPUS  | 16-bit  | 8 kHz or 16 kHz, mono or stereo |
+- WAV
+- MP3
+- OPUS/OGG
+- AAC
+- FLAC
+- WMA
+- ALAW in WAV container
+- MULAW in WAV container
+- AMR
+- WebM
+- MP4
+- M4A
+- SPEEX
 
-For stereo audio streams, the left and right channels are split during the transcription. A JSON result file is created for each input audio file. To create an ordered final transcript, use the timestamps that are generated per utterance.
+
+> [!NOTE]
+> Batch transcription service integrates GStreamer and may accept more formats and codecs without returning errors, while we suggest to use lossless formats such as WAV (PCM encoding) and FLAC to ensure best transcription quality.
 
 ## Azure Blob Storage upload
 
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md
@@ -18,7 +18,7 @@ For more information, see [personal voice](../../personal-voice-overview.md).
 
 Text to speech avatar is available in preview in the following regions: West US 2, West Europe, and Southeast Asia. 
 
-Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](#custom-text-to-speech-avatar)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
+Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](../../text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
 
 For more information, see [text to speech avatar](../../text-to-speech-avatar/what-is-text-to-speech-avatar.md), [transparency notes](/legal/cognitive-services/speech-service/text-to-speech/transparency-note?context=/azure/ai-services/speech-service/context/context), and [disclosure for voice and avatar talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/ai-services/speech-service/context/context).
 
diff --git a/articles/ai-services/speech-service/language-support.md b/articles/ai-services/speech-service/language-support.md
@@ -46,7 +46,7 @@ To improve Speech to text recognition accuracy, customization is available for s
 
 The table in this section summarizes the locales and voices supported for Text to speech. See the table footnotes for more details.
 
-Additional remarks for Text to speech locales are included in the [Voice styles and roles](#voice-styles-and-roles), [Prebuilt neural voices](#prebuilt-neural-voices), and [Custom Neural Voice](#custom-neural-voice) sections below. 
+Additional remarks for text to speech locales are included in the [voice styles and roles](#voice-styles-and-roles), [prebuilt neural voices](#prebuilt-neural-voices), [Custom Neural Voice](#custom-neural-voice), and [personal voice](#personal-voice) sections below. 
 
 > [!TIP]
 > Check the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) and determine the right voice for your business needs. 
@@ -100,6 +100,7 @@ With the cross-lingual feature, you can transfer your custom neural voice model
 
 [!INCLUDE [Language support include](includes/language-support/personal-voice.md)]
 
+
 # [Pronunciation assessment](#tab/pronunciation-assessment)
 
 The table in this section summarizes the 24 locales supported for pronunciation assessment, and each language is available on all [Speech to text regions](regions.md#speech-service). Latest update extends support from English to 23 additional languages and quality enhancements to existing features, including accuracy, fluency and miscue assessment. You should specify the language that you're learning or practicing improving pronunciation. The default language is set as `en-US`. If you know your target learning language, [set the locale](how-to-pronunciation-assessment.md#get-pronunciation-assessment-results) accordingly. For example, if you're learning British English, you should specify the language as `en-GB`. If you're teaching a broader language, such as Spanish, and are uncertain about which locale to select, you can run various accent models (`es-ES`, `es-MX`) to determine the one that achieves the highest score to suit your specific scenario. 
diff --git a/articles/ai-services/speech-service/power-automate-batch-transcription.md b/articles/ai-services/speech-service/power-automate-batch-transcription.md
@@ -127,7 +127,7 @@ To trigger the test flow, upload an audio file to the Azure Blob Storage contain
 
 ## Upload files to the container
 
-Follow these steps to upload [wav, mp3, or ogg](batch-transcription-audio-data.md#supported-audio-formats) files from your local directory to the Azure Storage container that you [created previously](#create-the-azure-blob-storage-container). 
+Follow these steps to upload [wav, mp3, or ogg](batch-transcription-audio-data.md#supported-audio-formats-and-codecs) files from your local directory to the Azure Storage container that you [created previously](#create-the-azure-blob-storage-container). 
 
 1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
 1. <a href="https://portal.azure.com/#create/Microsoft.StorageAccount-ARM"  title="Create a Storage account resource"  target="_blank">Create a Storage account resource</a> in the Azure portal. Use the same subscription and resource group as your Speech resource.
diff --git a/articles/ai-services/speech-service/speech-services-quotas-and-limits.md b/articles/ai-services/speech-service/speech-services-quotas-and-limits.md
@@ -51,7 +51,7 @@ You can use real-time speech to text with the [Speech SDK](speech-sdk.md) or the
 | Max audio input file size | N/A | 1 GB |
 | Max number of blobs per container | N/A | 10000 |
 | Max number of files per transcription request (when you're using multiple content URLs as input). | N/A | 1000  |
-| Max audio length for transcriptions with diarizaion enabled. | N/A | 240 minutes per file  |
+| Max audio length for transcriptions with diarization enabled. | N/A | 240 minutes per file  |
 
 #### Model customization