multi-lingual speech translation

eric-urban · eric-urban · commit 633c89632638 · 2024-04-24T17:06:55.000-07:00
diff --git a/articles/ai-services/speech-service/includes/how-to/translate-speech/csharp.md b/articles/ai-services/speech-service/includes/how-to/translate-speech/csharp.md
@@ -197,6 +197,64 @@ static async Task TranslateSpeechAsync()
 
 For more information about speech to text, see [the basics of speech recognition](../../../get-started-speech-to-text.md).
 
+## Event based translation
+
+The `TranslationRecognizer` object exposes a `Recognizing` event. The event fires several times and provides a mechanism to retrieve the intermediate translation results. 
+
+> [!NOTE]
+> Intermediate translation results aren't available when you use [multi-lingual speech translation](#multi-lingual-speech-translation-without-source-language-candidates).
+
+The following example prints the intermediate translation results to the console:
+
+```csharp
+using (var audioInput = AudioConfig.FromWavFileInput(@"whatstheweatherlike.wav"))
+{
+    using (var translationRecognizer = new TranslationRecognizer(config, audioInput))
+    {
+        // Subscribes to events.
+        translationRecognizer.Recognizing += (s, e) =>
+        {
+            Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
+            foreach (var element in e.Result.Translations)
+            {
+                Console.WriteLine($"    TRANSLATING into '{element.Key}': {element.Value}");
+            }
+        };
+
+        translationRecognizer.Recognized += (s, e) => {
+            if (e.Result.Reason == ResultReason.TranslatedSpeech)
+            {
+                Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
+                foreach (var element in e.Result.Translations)
+                {
+                    Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
+                }
+            }
+            else if (e.Result.Reason == ResultReason.RecognizedSpeech)
+            {
+                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
+                Console.WriteLine($"    Speech not translated.");
+            }
+            else if (e.Result.Reason == ResultReason.NoMatch)
+            {
+                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
+            }
+        };
+
+        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
+        Console.WriteLine("Start translation...");
+        await translationRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
+
+        // Waits for completion.
+        // Use Task.WaitAny to keep the task rooted.
+        Task.WaitAny(new[] { stopTranslation.Task });
+
+        // Stops translation.
+        await translationRecognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
+    }
+}
+```
+
 ## Synthesize translations
 
 After a successful speech recognition and translation, the result contains all the translations in a dictionary. The [`Translations`][translations] dictionary key is the target translation language, and the value is the translated text. Recognized speech can be translated and then synthesized in a different language (speech-to-speech).
@@ -314,11 +372,40 @@ The following example anticipates that `en-US` or `zh-CN` should be detected bec
 speechTranslationConfig.AddTargetLanguage("de");
 speechTranslationConfig.AddTargetLanguage("fr");
 var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "zh-CN" });
-var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig)
+var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
 ```
 
 For a complete code sample, see [language identification](../../../language-identification.md?pivots=programming-language-csharp#run-speech-translation).
 
+
+## Multi-lingual speech translation without source language candidates 
+
+Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
+
+Currently when you use Language ID with speech translation, you must create the `SpeechTranslationConfig` object from the v2 endpoint. Replace the string "YourServiceRegion" with your Speech resource region (such as "westus").
+
+```csharp
+var v2EndpointInString = String.Format("wss://{0}.stt.speech.microsoft.com/speech/universal/v2", "YourServiceRegion");
+var v2EndpointUrl = new Uri(v2EndpointInString);
+var speechTranslationConfig = SpeechTranslationConfig.FromEndpoint(v2EndpointUrl, "YourSubscriptionKey");
+```
+
+Specify the translation target languages. Replace with languages of your choice. You can add more lines.
+```csharp
+config.AddTargetLanguage("de");
+config.AddTargetLanguage("fr");
+```
+
+A key differentiator with multi-lingual speech translation is that you do not need to specify the source language. This is because the service will automatically detect the source language. Create the `AutoDetectSourceLanguageConfig` object with the `fromOpenRange` method to let the service know that you want to use multi-lingual speech translation with no specified source language. 
+
+```csharp
+AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromOpenRange(); 
+var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
+```
+
+For a complete code sample with the Speech SDK, see [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
+
+
 [speechtranslationconfig]: /dotnet/api/microsoft.cognitiveservices.speech.speechtranslationconfig
 [audioconfig]: /dotnet/api/microsoft.cognitiveservices.speech.audio.audioconfig
 [translationrecognizer]: /dotnet/api/microsoft.cognitiveservices.speech.translation.translationrecognizer
diff --git a/articles/ai-services/speech-service/includes/language-support/speech-translation.md b/articles/ai-services/speech-service/includes/language-support/speech-translation.md
@@ -1,12 +1,12 @@
 ---
 author: eric-urban
 ms.service: azure-ai-speech
-ms.date: 08/22/2022
+ms.date: 4/24/2024
 ms.topic: include
 ms.author: eur
 ---
 
-| Text language| Language code |
+| Text language | Language code |
 |:------------------------|:-------------:|
 | Afrikaans | `af` |
 | Albanian | `sq` |
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
@@ -2,23 +2,29 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 3/13/2024
+ms.date: 4/22/2024
 ms.author: eur
 ---
 
 ### April 2024 release
 
+#### Multi-lingual speech translation (Preview)
+
+Multi-lingual speech translation is available in public preview. Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products. 
+
+For more information about multi-lingual speech translation, see [the multi-lingual speech translation overview](../../speech-translation.md#multi-lingual-speech-translation-preview). 
+
 #### Real-time speech to text with diariazation (GA)
 
 Real-time speech to text with diariazation is now generally available.
 
-Check out [Real-time diarization quickstart](../../get-started-stt-diarization.md) to learn more about how to create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation.
+You can create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation. For more information about real-time diarization, Check out the [real-time diarization quickstart](../../get-started-stt-diarization.md).
 
-#### Speech to Text model Update
+#### Speech to text model Update
 
-[Real-time Speech to Text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support. 
+[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support. 
 
-[Batch transcription](../../batch-transcription.md) has launched models with new architecture for `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`. These models significantly enhance readability and entity recognition. 
+[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition. 
 
 ### March 2024 release
 
diff --git a/articles/ai-services/speech-service/releasenotes.md b/articles/ai-services/speech-service/releasenotes.md
@@ -7,7 +7,7 @@ author: eric-urban
 ms.author: eur
 ms.service: azure-ai-speech
 ms.topic: release-notes
-ms.date: 1/21/2024
+ms.date: 4/22/2024
 ms.custom: references_regions
 ---
 
diff --git a/articles/ai-services/speech-service/speech-translation.md b/articles/ai-services/speech-service/speech-translation.md
@@ -6,29 +6,82 @@ author: eric-urban
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: overview
-ms.date: 1/22/2024
+ms.date: 4/22/2024
 ms.author: eur
 ms.custom: devx-track-csharp
 ---
 
 # What is speech translation?
 
-In this article, you learn about the benefits and capabilities of the speech translation service, which enables real-time, multi-language speech to speech and speech to text translation of audio streams. 
+In this article, you learn about the benefits and capabilities of translation with Azure AI Speech. The Speech service supports real-time, multi-language speech to speech and speech to text translation of audio streams. 
 
 By using the Speech SDK or Speech CLI, you can give your applications, tools, and devices access to source transcriptions and translation outputs for the provided audio. Interim transcription and translation results are returned as speech is detected, and the final results can be converted into synthesized speech.
 
 For a list of languages supported for speech translation, see [Language and voice support](language-support.md?tabs=speech-translation).
 
+> [!TIP]
+> Go to the [Speech Studio](https://aka.ms/speechstudio/speechtranslation) to quickly test and translate speech into other languages of your choice with low latency.
+
 ## Core features
 
-* Speech to text translation with recognition results.
-* Speech-to-speech translation.
-* Support for translation to multiple target languages.
-* Interim recognition and translation results.
+The core features of speech translation include:
+
+- [Speech to text translation](#speech-to-text-translation)
+- [Speech to speech translation](#speech-to-speech-translation)
+- [Multi-lingual speech translation](#multi-lingual-speech-translation-preview)
+- [Multiple target languages translation](#multiple-target-languages-translation)
+
+## Speech to text translation
+
+The standard feature offered by the Speech service is the ability to take in an input audio stream in your specified source language, and have it translated and outputted as text in your specified target language. 
+
+## Speech to speech translation
+
+As a supplement to the above feature, the Speech service also offers the option to read aloud the translated text using our large database of pretrained voices, allowing for a natural output of the input speech. 
+
+## Multi-lingual speech translation (Preview)
+
+Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products. 
+
+- Unspecified Input Language. Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is. 
+- Language Switching. Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you. 
+- Transcription. The service outputs a transcription in the specified target language. Source language transcription isn't available yet. 
+
+Some use cases for multi-lingual speech translation include:
+
+- Travel Interpreter. When traveling abroad, multi-lingual speech translation offers the ability to create a solution that allows customers to translate any input audio to and from the local language. This allows them to communicate with the locals and better understand their surroundings. 
+- Business Meeting. In a meeting with people who speak different languages, multi-lingual speech translation allows the members of the meeting to all communicate with each other naturally as if there was no language barrier. 
+
+For multi-lingual speech translation, these are the languages the Speech service can automatically detect and switch between from the input: Arabic (ar), Basque (eu), Bosnian (bs), Bulgarian (bg), Chinese Simplified (zh), Chinese Traditional (zhh), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (nb), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi), and Welsh (cy).
+
+For a list of the supported output (target) languages, see the *Translate to text language* table in the [language and voice support documentation](language-support.md?tabs=speech-translation).
+
+For more information on multi-lingual speech translation, see [the speech translation how to guide](./how-to-translate-speech.md#multi-lingual-speech-translation-without-source-language-candidates) and [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
+
+## Multiple target languages translation
+
+In scenarios where you want output in multiple languages, the Speech service directly offers the ability for you to translate the input language into two target languages. This enables them to receive two outputs and share these translations to a wider audience with a single API call. If more output languages are required, you can create a multi-service resource or use separate translation services. 
+
+If you need translation into more than two target languages, you need to either [create a multi-service resource](../multi-service-resource.md) or utilize separate translation services for more languages beyond the second. If you choose to call the speech translation service with a multi-service resource, please note that translation fees apply for each language beyond the second, based on the character count of the translation. 
+
+To calculate the applied translation fee, please refer to [Azure AI Translator pricing](https://azure.microsoft.com/products/ai-services/ai-translator#Pricing). 
+
+### Multiple target languages translation pricing
+
+It's important to note that the speech translation service operates in real-time, and the intermediate speech results are translated to generate intermediate translation results. Therefore, the actual translation amount is greater than the input audio's tokens. You're charged for the speech to text transcription and the text translation for each target language.
+
+For example, let's say that you want text translations from a one-hour audio file to three target languages. If the initial speech to text transcription contains 10,000 characters, you might be charged $2.80. 
+
+> [!WARNING]
+> The prices in this example are for illustrative purposes only. Please refer to the [Azure AI Speech pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) and [Azure AI Translator pricing](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
+
+The previous example price of $2.80 was calculated by combining the speech to text transcription and the text translation costs. Here's how the calculation was done: 
+- The speech to text list price is $2.50 per hour. See **Pay as You Go** > **Speech translation** > **Standard** in the [Azure AI Speech pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) for the most up-to-date pricing information.
+- The cost for the third language translation is 30 cents in this example. The translation list price is $10 per million characters. Since the audio file contains 10,000 characters, the translation cost is $10 * 10,000 / 1,000,000 * 3 = $0.3. The number "3" in this equation represents a weighting coefficient of intermediate traffic, which might vary depending on the languages involved. See **Pay as You Go** > **Standard translation** > **Text translation** in the [Azure AI Translator pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
 
 ## Get started
 
-As your first step, try the [Speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
+As your first step, try the [speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
 
 You find [Speech SDK speech to text and translation samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk) on GitHub. These samples cover common scenarios, such as reading audio from a file or stream, continuous and single-shot recognition and translation, and working with custom models.