Skip to content

Commit 633c896

Browse files
committed
multi-lingual speech translation
1 parent 84425a8 commit 633c896

File tree

5 files changed

+162
-16
lines changed

5 files changed

+162
-16
lines changed

articles/ai-services/speech-service/includes/how-to/translate-speech/csharp.md

Lines changed: 88 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,64 @@ static async Task TranslateSpeechAsync()
197197

198198
For more information about speech to text, see [the basics of speech recognition](../../../get-started-speech-to-text.md).
199199

200+
## Event based translation
201+
202+
The `TranslationRecognizer` object exposes a `Recognizing` event. The event fires several times and provides a mechanism to retrieve the intermediate translation results.
203+
204+
> [!NOTE]
205+
> Intermediate translation results aren't available when you use [multi-lingual speech translation](#multi-lingual-speech-translation-without-source-language-candidates).
206+
207+
The following example prints the intermediate translation results to the console:
208+
209+
```csharp
210+
using (var audioInput = AudioConfig.FromWavFileInput(@"whatstheweatherlike.wav"))
211+
{
212+
using (var translationRecognizer = new TranslationRecognizer(config, audioInput))
213+
{
214+
// Subscribes to events.
215+
translationRecognizer.Recognizing += (s, e) =>
216+
{
217+
Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
218+
foreach (var element in e.Result.Translations)
219+
{
220+
Console.WriteLine($" TRANSLATING into '{element.Key}': {element.Value}");
221+
}
222+
};
223+
224+
translationRecognizer.Recognized += (s, e) => {
225+
if (e.Result.Reason == ResultReason.TranslatedSpeech)
226+
{
227+
Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
228+
foreach (var element in e.Result.Translations)
229+
{
230+
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
231+
}
232+
}
233+
else if (e.Result.Reason == ResultReason.RecognizedSpeech)
234+
{
235+
Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
236+
Console.WriteLine($" Speech not translated.");
237+
}
238+
else if (e.Result.Reason == ResultReason.NoMatch)
239+
{
240+
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
241+
}
242+
};
243+
244+
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
245+
Console.WriteLine("Start translation...");
246+
await translationRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
247+
248+
// Waits for completion.
249+
// Use Task.WaitAny to keep the task rooted.
250+
Task.WaitAny(new[] { stopTranslation.Task });
251+
252+
// Stops translation.
253+
await translationRecognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
254+
}
255+
}
256+
```
257+
200258
## Synthesize translations
201259

202260
After a successful speech recognition and translation, the result contains all the translations in a dictionary. The [`Translations`][translations] dictionary key is the target translation language, and the value is the translated text. Recognized speech can be translated and then synthesized in a different language (speech-to-speech).
@@ -314,11 +372,40 @@ The following example anticipates that `en-US` or `zh-CN` should be detected bec
314372
speechTranslationConfig.AddTargetLanguage("de");
315373
speechTranslationConfig.AddTargetLanguage("fr");
316374
var autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "zh-CN" });
317-
var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig)
375+
var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
318376
```
319377

320378
For a complete code sample, see [language identification](../../../language-identification.md?pivots=programming-language-csharp#run-speech-translation).
321379

380+
381+
## Multi-lingual speech translation without source language candidates
382+
383+
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
384+
385+
Currently when you use Language ID with speech translation, you must create the `SpeechTranslationConfig` object from the v2 endpoint. Replace the string "YourServiceRegion" with your Speech resource region (such as "westus").
386+
387+
```csharp
388+
var v2EndpointInString = String.Format("wss://{0}.stt.speech.microsoft.com/speech/universal/v2", "YourServiceRegion");
389+
var v2EndpointUrl = new Uri(v2EndpointInString);
390+
var speechTranslationConfig = SpeechTranslationConfig.FromEndpoint(v2EndpointUrl, "YourSubscriptionKey");
391+
```
392+
393+
Specify the translation target languages. Replace with languages of your choice. You can add more lines.
394+
```csharp
395+
config.AddTargetLanguage("de");
396+
config.AddTargetLanguage("fr");
397+
```
398+
399+
A key differentiator with multi-lingual speech translation is that you do not need to specify the source language. This is because the service will automatically detect the source language. Create the `AutoDetectSourceLanguageConfig` object with the `fromOpenRange` method to let the service know that you want to use multi-lingual speech translation with no specified source language.
400+
401+
```csharp
402+
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig = AutoDetectSourceLanguageConfig.fromOpenRange();
403+
var translationRecognizer = new TranslationRecognizer(speechTranslationConfig, autoDetectSourceLanguageConfig, audioConfig);
404+
```
405+
406+
For a complete code sample with the Speech SDK, see [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
407+
408+
322409
[speechtranslationconfig]: /dotnet/api/microsoft.cognitiveservices.speech.speechtranslationconfig
323410
[audioconfig]: /dotnet/api/microsoft.cognitiveservices.speech.audio.audioconfig
324411
[translationrecognizer]: /dotnet/api/microsoft.cognitiveservices.speech.translation.translationrecognizer

articles/ai-services/speech-service/includes/language-support/speech-translation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
author: eric-urban
33
ms.service: azure-ai-speech
4-
ms.date: 08/22/2022
4+
ms.date: 4/24/2024
55
ms.topic: include
66
ms.author: eur
77
---
88

9-
| Text language| Language code |
9+
| Text language | Language code |
1010
|:------------------------|:-------------:|
1111
| Afrikaans | `af` |
1212
| Albanian | `sq` |

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,29 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 3/13/2024
5+
ms.date: 4/22/2024
66
ms.author: eur
77
---
88

99
### April 2024 release
1010

11+
#### Multi-lingual speech translation (Preview)
12+
13+
Multi-lingual speech translation is available in public preview. Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
14+
15+
For more information about multi-lingual speech translation, see [the multi-lingual speech translation overview](../../speech-translation.md#multi-lingual-speech-translation-preview).
16+
1117
#### Real-time speech to text with diariazation (GA)
1218

1319
Real-time speech to text with diariazation is now generally available.
1420

15-
Check out [Real-time diarization quickstart](../../get-started-stt-diarization.md) to learn more about how to create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation.
21+
You can create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation. For more information about real-time diarization, Check out the [real-time diarization quickstart](../../get-started-stt-diarization.md).
1622

17-
#### Speech to Text model Update
23+
#### Speech to text model Update
1824

19-
[Real-time Speech to Text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
25+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
2026

21-
[Batch transcription](../../batch-transcription.md) has launched models with new architecture for `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`. These models significantly enhance readability and entity recognition.
27+
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
2228

2329
### March 2024 release
2430

articles/ai-services/speech-service/releasenotes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: release-notes
10-
ms.date: 1/21/2024
10+
ms.date: 4/22/2024
1111
ms.custom: references_regions
1212
---
1313

articles/ai-services/speech-service/speech-translation.md

Lines changed: 60 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,29 +6,82 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: overview
9-
ms.date: 1/22/2024
9+
ms.date: 4/22/2024
1010
ms.author: eur
1111
ms.custom: devx-track-csharp
1212
---
1313

1414
# What is speech translation?
1515

16-
In this article, you learn about the benefits and capabilities of the speech translation service, which enables real-time, multi-language speech to speech and speech to text translation of audio streams.
16+
In this article, you learn about the benefits and capabilities of translation with Azure AI Speech. The Speech service supports real-time, multi-language speech to speech and speech to text translation of audio streams.
1717

1818
By using the Speech SDK or Speech CLI, you can give your applications, tools, and devices access to source transcriptions and translation outputs for the provided audio. Interim transcription and translation results are returned as speech is detected, and the final results can be converted into synthesized speech.
1919

2020
For a list of languages supported for speech translation, see [Language and voice support](language-support.md?tabs=speech-translation).
2121

22+
> [!TIP]
23+
> Go to the [Speech Studio](https://aka.ms/speechstudio/speechtranslation) to quickly test and translate speech into other languages of your choice with low latency.
24+
2225
## Core features
2326

24-
* Speech to text translation with recognition results.
25-
* Speech-to-speech translation.
26-
* Support for translation to multiple target languages.
27-
* Interim recognition and translation results.
27+
The core features of speech translation include:
28+
29+
- [Speech to text translation](#speech-to-text-translation)
30+
- [Speech to speech translation](#speech-to-speech-translation)
31+
- [Multi-lingual speech translation](#multi-lingual-speech-translation-preview)
32+
- [Multiple target languages translation](#multiple-target-languages-translation)
33+
34+
## Speech to text translation
35+
36+
The standard feature offered by the Speech service is the ability to take in an input audio stream in your specified source language, and have it translated and outputted as text in your specified target language.
37+
38+
## Speech to speech translation
39+
40+
As a supplement to the above feature, the Speech service also offers the option to read aloud the translated text using our large database of pretrained voices, allowing for a natural output of the input speech.
41+
42+
## Multi-lingual speech translation (Preview)
43+
44+
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
45+
46+
- Unspecified Input Language. Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is.
47+
- Language Switching. Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you.
48+
- Transcription. The service outputs a transcription in the specified target language. Source language transcription isn't available yet.
49+
50+
Some use cases for multi-lingual speech translation include:
51+
52+
- Travel Interpreter. When traveling abroad, multi-lingual speech translation offers the ability to create a solution that allows customers to translate any input audio to and from the local language. This allows them to communicate with the locals and better understand their surroundings.
53+
- Business Meeting. In a meeting with people who speak different languages, multi-lingual speech translation allows the members of the meeting to all communicate with each other naturally as if there was no language barrier.
54+
55+
For multi-lingual speech translation, these are the languages the Speech service can automatically detect and switch between from the input: Arabic (ar), Basque (eu), Bosnian (bs), Bulgarian (bg), Chinese Simplified (zh), Chinese Traditional (zhh), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (nb), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi), and Welsh (cy).
56+
57+
For a list of the supported output (target) languages, see the *Translate to text language* table in the [language and voice support documentation](language-support.md?tabs=speech-translation).
58+
59+
For more information on multi-lingual speech translation, see [the speech translation how to guide](./how-to-translate-speech.md#multi-lingual-speech-translation-without-source-language-candidates) and [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
60+
61+
## Multiple target languages translation
62+
63+
In scenarios where you want output in multiple languages, the Speech service directly offers the ability for you to translate the input language into two target languages. This enables them to receive two outputs and share these translations to a wider audience with a single API call. If more output languages are required, you can create a multi-service resource or use separate translation services.
64+
65+
If you need translation into more than two target languages, you need to either [create a multi-service resource](../multi-service-resource.md) or utilize separate translation services for more languages beyond the second. If you choose to call the speech translation service with a multi-service resource, please note that translation fees apply for each language beyond the second, based on the character count of the translation.
66+
67+
To calculate the applied translation fee, please refer to [Azure AI Translator pricing](https://azure.microsoft.com/products/ai-services/ai-translator#Pricing).
68+
69+
### Multiple target languages translation pricing
70+
71+
It's important to note that the speech translation service operates in real-time, and the intermediate speech results are translated to generate intermediate translation results. Therefore, the actual translation amount is greater than the input audio's tokens. You're charged for the speech to text transcription and the text translation for each target language.
72+
73+
For example, let's say that you want text translations from a one-hour audio file to three target languages. If the initial speech to text transcription contains 10,000 characters, you might be charged $2.80.
74+
75+
> [!WARNING]
76+
> The prices in this example are for illustrative purposes only. Please refer to the [Azure AI Speech pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) and [Azure AI Translator pricing](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
77+
78+
The previous example price of $2.80 was calculated by combining the speech to text transcription and the text translation costs. Here's how the calculation was done:
79+
- The speech to text list price is $2.50 per hour. See **Pay as You Go** > **Speech translation** > **Standard** in the [Azure AI Speech pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) for the most up-to-date pricing information.
80+
- The cost for the third language translation is 30 cents in this example. The translation list price is $10 per million characters. Since the audio file contains 10,000 characters, the translation cost is $10 * 10,000 / 1,000,000 * 3 = $0.3. The number "3" in this equation represents a weighting coefficient of intermediate traffic, which might vary depending on the languages involved. See **Pay as You Go** > **Standard translation** > **Text translation** in the [Azure AI Translator pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
2881

2982
## Get started
3083

31-
As your first step, try the [Speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
84+
As your first step, try the [speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
3285

3386
You find [Speech SDK speech to text and translation samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk) on GitHub. These samples cover common scenarios, such as reading audio from a file or stream, continuous and single-shot recognition and translation, and working with custom models.
3487

0 commit comments

Comments
 (0)