Merge pull request #273329 from eric-urban/eur/speech-translation

v-dirichards · web-flow · commit c0cd2a32420f · 2024-04-30T15:07:30.000-05:00
multi-lingual speech translation
diff --git a/articles/ai-services/speech-service/includes/how-to/translate-speech/csharp.md b/articles/ai-services/speech-service/includes/how-to/translate-speech/csharp.md
@@ -197,6 +197,64 @@ static async Task TranslateSpeechAsync()
 
 For more information about speech to text, see [the basics of speech recognition](../../../get-started-speech-to-text.md).
 
+## Event based translation
+
+The `TranslationRecognizer` object exposes a `Recognizing` event. The event fires several times and provides a mechanism to retrieve the intermediate translation results. 
+
+> [!NOTE]
+> Intermediate translation results aren't available when you use [multi-lingual speech translation](#multi-lingual-speech-translation-without-source-language-candidates).
+
+The following example prints the intermediate translation results to the console:
+
+```csharp
+using (var audioInput = AudioConfig.FromWavFileInput(@"whatstheweatherlike.wav"))
+{
+    using (var translationRecognizer = new TranslationRecognizer(config, audioInput))
+    {
+        // Subscribes to events.
+        translationRecognizer.Recognizing += (s, e) =>
+        {
+            Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
+            foreach (var element in e.Result.Translations)
+            {
+                Console.WriteLine($"    TRANSLATING into '{element.Key}': {element.Value}");
+            }
+        };
+
+        translationRecognizer.Recognized += (s, e) => {
+            if (e.Result.Reason == ResultReason.TranslatedSpeech)
+            {
+                Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
+                foreach (var element in e.Result.Translations)
+                {
+                    Console.WriteLine($"    TRANSLATED into '{element.Key}': {element.Value}");
+                }
+            }
+            else if (e.Result.Reason == ResultReason.RecognizedSpeech)
+            {
+                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
+                Console.WriteLine($"    Speech not translated.");
+            }
+            else if (e.Result.Reason == ResultReason.NoMatch)
+            {
+                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
+            }
+        };
+
+        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
+        Console.WriteLine("Start translation...");
+        await translationRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
+
+        // Waits for completion.
+        // Use Task.WaitAny to keep the task rooted.
+        Task.WaitAny(new[] { stopTranslation.Task });
+
+        // Stops translation.
+        await translationRecognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
+    }
+}
+```
+
 ## Synthesize translations
 
 After a successful speech recognition and translation, the result contains all the translations in a dictionary. The [`Translations`][translations] dictionary key is the target translation language, and the value is the translated text. Recognized speech can be translated and then synthesized in a different language (speech-to-speech).
@@ -314,11 +372,40 @@ The following example anticipates that `en-US` or `zh-CN` should be detected bec
 speechTranslationConfig.AddTargetLanguage("de");
 speechTranslationConfig.AddTargetLanguage("fr");
 var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "zh-CN" });
-var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig)
+var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
 ```
 
 For a complete code sample, see [language identification](../../../language-identification.md?pivots=programming-language-csharp#run-speech-translation).
 
+
+## Multi-lingual speech translation without source language candidates 
+
+Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, and handling language switches within the same session. These features enable a new level of speech translation powers that can be implemented into your products.
+
+Currently when you use Language ID with speech translation, you must create the `SpeechTranslationConfig` object from the v2 endpoint. Replace the string "YourServiceRegion" with your Speech resource region (such as "westus"). Replace "YourSubscriptionKey" with your Speech resource key.
+
+```csharp
+var v2EndpointInString = String.Format("wss://{0}.stt.speech.microsoft.com/speech/universal/v2", "YourServiceRegion");
+var v2EndpointUrl = new Uri(v2EndpointInString);
+var speechTranslationConfig = SpeechTranslationConfig.FromEndpoint(v2EndpointUrl, "YourSubscriptionKey");
+```
+
+Specify the translation target languages. Replace with languages of your choice. You can add more lines.
+```csharp
+config.AddTargetLanguage("de");
+config.AddTargetLanguage("fr");
+```
+
+A key differentiator with multi-lingual speech translation is that you do not need to specify the source language. This is because the service will automatically detect the source language. Create the `AutoDetectSourceLanguageConfig` object with the `fromOpenRange` method to let the service know that you want to use multi-lingual speech translation with no specified source language. 
+
+```csharp
+AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromOpenRange(); 
+var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
+```
+
+For a complete code sample with the Speech SDK, see [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
+
+
 [speechtranslationconfig]: /dotnet/api/microsoft.cognitiveservices.speech.speechtranslationconfig
 [audioconfig]: /dotnet/api/microsoft.cognitiveservices.speech.audio.audioconfig
 [translationrecognizer]: /dotnet/api/microsoft.cognitiveservices.speech.translation.translationrecognizer
diff --git a/articles/ai-services/speech-service/includes/language-support/speech-translation.md b/articles/ai-services/speech-service/includes/language-support/speech-translation.md
@@ -1,12 +1,12 @@
 ---
 author: eric-urban
 ms.service: azure-ai-speech
-ms.date: 08/22/2022
+ms.date: 4/24/2024
 ms.topic: include
 ms.author: eur
 ---
 
-| Text language| Language code |
+| Text language | Language code |
 |:------------------------|:-------------:|
 | Afrikaans | `af` |
 | Albanian | `sq` |
diff --git a/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md b/articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
@@ -2,23 +2,42 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 3/13/2024
+ms.date: 4/22/2024
 ms.author: eur
 ---
 
 ### April 2024 release
 
+#### Automatic multi-lingual speech translation (Preview)
+
+Automatic multi-lingual speech translation is available in public preview. This innovative feature revolutionizes the way language barriers are overcome, offering unparalleled capabilities for seamless communication across diverse linguistic landscapes.
+
+##### Key Highlights
+
+- Unspecified input language: Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is. It makes it an invaluable feature to understand and collaborate across global contexts without the need for presetting.
+- Language switching: Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you.
+
+##### How it works
+
+- Travel interpreter: multi-lingual speech translation can enhance the experience of tourists visiting foreign destinations by providing them with information and assistance in their preferred language. Hotel concierge services, guided tours, and visitor centers can utilize this technology to cater to diverse linguistic needs.
+- International conferences: multi-lingual speech translation can facilitate communication among participants from different regions who might speak various languages using live translated caption. Attendees can speak in their native languages without needing to specify them, ensuring seamless understanding and collaboration.
+- Educational meetings: In multi-cultural classrooms or online learning environments, multi-lingual speech translation can support language diversity among students and teachers. It allows for seamless communication and participation without the need to specify each student's or instructor's language.
+
+##### How to access
+
+For a detailed introduction, visit [Speech translation overview](../../speech-translation.md). Additionally, you can refer to the code samples at [how to translate speech](../../how-to-translate-speech.md). This new feature is fully supported by all SDK versions from 1.37.0 onwards.
+
 #### Real-time speech to text with diariazation (GA)
 
 Real-time speech to text with diariazation is now generally available.
 
-Check out [Real-time diarization quickstart](../../get-started-stt-diarization.md) to learn more about how to create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation.
+You can create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation. For more information about real-time diarization, Check out the [real-time diarization quickstart](../../get-started-stt-diarization.md).
 
-#### Speech to Text model Update
+#### Speech to text model Update
 
-[Real-time Speech to Text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support. 
+[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support. 
 
-[Batch transcription](../../batch-transcription.md) has launched models with new architecture for `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`. These models significantly enhance readability and entity recognition. 
+[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition. 
 
 ### March 2024 release
 
@@ -82,7 +101,7 @@ How to Use:
 
 Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.
 
-It's time to elevate your communication game with our latest feature release—seamless, multilingual communication at your fingertips!
+It's time to elevate your communication game with our latest feature release—seamless, multi-lingual communication at your fingertips!
 
 #### Speech To text models update
 
diff --git a/articles/ai-services/speech-service/releasenotes.md b/articles/ai-services/speech-service/releasenotes.md
@@ -7,7 +7,7 @@ author: eric-urban
 ms.author: eur
 ms.service: azure-ai-speech
 ms.topic: release-notes
-ms.date: 1/21/2024
+ms.date: 4/22/2024
 ms.custom: references_regions
 ---
 
diff --git a/articles/ai-services/speech-service/speech-translation.md b/articles/ai-services/speech-service/speech-translation.md