You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For more information about speech to text, see [the basics of speech recognition](../../../get-started-speech-to-text.md).
199
199
200
+
## Event based translation
201
+
202
+
The `TranslationRecognizer` object exposes a `Recognizing` event. The event fires several times and provides a mechanism to retrieve the intermediate translation results.
203
+
204
+
> [!NOTE]
205
+
> Intermediate translation results aren't available when you use [multi-lingual speech translation](#multi-lingual-speech-translation-without-source-language-candidates).
206
+
207
+
The following example prints the intermediate translation results to the console:
208
+
209
+
```csharp
210
+
using (varaudioInput=AudioConfig.FromWavFileInput(@"whatstheweatherlike.wav"))
211
+
{
212
+
using (vartranslationRecognizer=newTranslationRecognizer(config, audioInput))
213
+
{
214
+
// Subscribes to events.
215
+
translationRecognizer.Recognizing+= (s, e) =>
216
+
{
217
+
Console.WriteLine($"RECOGNIZING in '{fromLanguage}': Text={e.Result.Text}");
218
+
foreach (varelementine.Result.Translations)
219
+
{
220
+
Console.WriteLine($" TRANSLATING into '{element.Key}': {element.Value}");
221
+
}
222
+
};
223
+
224
+
translationRecognizer.Recognized+= (s, e) => {
225
+
if (e.Result.Reason==ResultReason.TranslatedSpeech)
226
+
{
227
+
Console.WriteLine($"RECOGNIZED in '{fromLanguage}': Text={e.Result.Text}");
228
+
foreach (varelementine.Result.Translations)
229
+
{
230
+
Console.WriteLine($" TRANSLATED into '{element.Key}': {element.Value}");
After a successful speech recognition and translation, the result contains all the translations in a dictionary. The [`Translations`][translations] dictionary key is the target translation language, and the value is the translated text. Recognized speech can be translated and then synthesized in a different language (speech-to-speech).
@@ -314,11 +372,40 @@ The following example anticipates that `en-US` or `zh-CN` should be detected bec
For a complete code sample, see [language identification](../../../language-identification.md?pivots=programming-language-csharp#run-speech-translation).
321
379
380
+
381
+
## Multi-lingual speech translation without source language candidates
382
+
383
+
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
384
+
385
+
Currently when you use Language ID with speech translation, you must create the `SpeechTranslationConfig` object from the v2 endpoint. Replace the string "YourServiceRegion" with your Speech resource region (such as "westus").
Specify the translation target languages. Replace with languages of your choice. You can add more lines.
394
+
```csharp
395
+
config.AddTargetLanguage("de");
396
+
config.AddTargetLanguage("fr");
397
+
```
398
+
399
+
A key differentiator with multi-lingual speech translation is that you do not need to specify the source language. This is because the service will automatically detect the source language. Create the `AutoDetectSourceLanguageConfig` object with the `fromOpenRange` method to let the service know that you want to use multi-lingual speech translation with no specified source language.
For a complete code sample with the Speech SDK, see [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,23 +2,29 @@
2
2
author: eric-urban
3
3
ms.service: azure-ai-speech
4
4
ms.topic: include
5
-
ms.date: 3/13/2024
5
+
ms.date: 4/22/2024
6
6
ms.author: eur
7
7
---
8
8
9
9
### April 2024 release
10
10
11
+
#### Multi-lingual speech translation (Preview)
12
+
13
+
Multi-lingual speech translation is available in public preview. Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
14
+
15
+
For more information about multi-lingual speech translation, see [the multi-lingual speech translation overview](../../speech-translation.md#multi-lingual-speech-translation-preview).
16
+
11
17
#### Real-time speech to text with diariazation (GA)
12
18
13
19
Real-time speech to text with diariazation is now generally available.
14
20
15
-
Check out [Real-time diarization quickstart](../../get-started-stt-diarization.md) to learn more about how to create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation.
21
+
You can create speech to text applications that use diarization to distinguish between the different speakers who participate in the conversation. For more information about real-time diarization, Check out the [real-time diarization quickstart](../../get-started-stt-diarization.md).
16
22
17
-
#### Speech to Text model Update
23
+
#### Speech to text model Update
18
24
19
-
[Real-time Speech to Text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
25
+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now support both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
20
26
21
-
[Batch transcription](../../batch-transcription.md)has launched models with new architecture for `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, `zh-CN`. These models significantly enhance readability and entity recognition.
27
+
[Batch transcription](../../batch-transcription.md)provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and`zh-CN`. These models significantly enhance readability and entity recognition.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/speech-translation.md
+60-7Lines changed: 60 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,29 +6,82 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: overview
9
-
ms.date: 1/22/2024
9
+
ms.date: 4/22/2024
10
10
ms.author: eur
11
11
ms.custom: devx-track-csharp
12
12
---
13
13
14
14
# What is speech translation?
15
15
16
-
In this article, you learn about the benefits and capabilities of the speech translation service, which enables real-time, multi-language speech to speech and speech to text translation of audio streams.
16
+
In this article, you learn about the benefits and capabilities of translation with Azure AI Speech. The Speech service supports real-time, multi-language speech to speech and speech to text translation of audio streams.
17
17
18
18
By using the Speech SDK or Speech CLI, you can give your applications, tools, and devices access to source transcriptions and translation outputs for the provided audio. Interim transcription and translation results are returned as speech is detected, and the final results can be converted into synthesized speech.
19
19
20
20
For a list of languages supported for speech translation, see [Language and voice support](language-support.md?tabs=speech-translation).
21
21
22
+
> [!TIP]
23
+
> Go to the [Speech Studio](https://aka.ms/speechstudio/speechtranslation) to quickly test and translate speech into other languages of your choice with low latency.
24
+
22
25
## Core features
23
26
24
-
* Speech to text translation with recognition results.
25
-
* Speech-to-speech translation.
26
-
* Support for translation to multiple target languages.
27
-
* Interim recognition and translation results.
27
+
The core features of speech translation include:
28
+
29
+
-[Speech to text translation](#speech-to-text-translation)
30
+
-[Speech to speech translation](#speech-to-speech-translation)
-[Multiple target languages translation](#multiple-target-languages-translation)
33
+
34
+
## Speech to text translation
35
+
36
+
The standard feature offered by the Speech service is the ability to take in an input audio stream in your specified source language, and have it translated and outputted as text in your specified target language.
37
+
38
+
## Speech to speech translation
39
+
40
+
As a supplement to the above feature, the Speech service also offers the option to read aloud the translated text using our large database of pretrained voices, allowing for a natural output of the input speech.
41
+
42
+
## Multi-lingual speech translation (Preview)
43
+
44
+
Multi-lingual speech translation implements a new level of speech translation technology that unlocks various capabilities, including having no specified input language, handling language switches within the same session, and supporting live streaming translations into English. These features enable a new level of speech translation powers that can be implemented into your products.
45
+
46
+
- Unspecified Input Language. Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is.
47
+
- Language Switching. Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you.
48
+
- Transcription. The service outputs a transcription in the specified target language. Source language transcription isn't available yet.
49
+
50
+
Some use cases for multi-lingual speech translation include:
51
+
52
+
- Travel Interpreter. When traveling abroad, multi-lingual speech translation offers the ability to create a solution that allows customers to translate any input audio to and from the local language. This allows them to communicate with the locals and better understand their surroundings.
53
+
- Business Meeting. In a meeting with people who speak different languages, multi-lingual speech translation allows the members of the meeting to all communicate with each other naturally as if there was no language barrier.
54
+
55
+
For multi-lingual speech translation, these are the languages the Speech service can automatically detect and switch between from the input: Arabic (ar), Basque (eu), Bosnian (bs), Bulgarian (bg), Chinese Simplified (zh), Chinese Traditional (zhh), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (nb), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi), and Welsh (cy).
56
+
57
+
For a list of the supported output (target) languages, see the *Translate to text language* table in the [language and voice support documentation](language-support.md?tabs=speech-translation).
58
+
59
+
For more information on multi-lingual speech translation, see [the speech translation how to guide](./how-to-translate-speech.md#multi-lingual-speech-translation-without-source-language-candidates) and [speech translation samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/translation_samples.cs#L472).
60
+
61
+
## Multiple target languages translation
62
+
63
+
In scenarios where you want output in multiple languages, the Speech service directly offers the ability for you to translate the input language into two target languages. This enables them to receive two outputs and share these translations to a wider audience with a single API call. If more output languages are required, you can create a multi-service resource or use separate translation services.
64
+
65
+
If you need translation into more than two target languages, you need to either [create a multi-service resource](../multi-service-resource.md) or utilize separate translation services for more languages beyond the second. If you choose to call the speech translation service with a multi-service resource, please note that translation fees apply for each language beyond the second, based on the character count of the translation.
66
+
67
+
To calculate the applied translation fee, please refer to [Azure AI Translator pricing](https://azure.microsoft.com/products/ai-services/ai-translator#Pricing).
68
+
69
+
### Multiple target languages translation pricing
70
+
71
+
It's important to note that the speech translation service operates in real-time, and the intermediate speech results are translated to generate intermediate translation results. Therefore, the actual translation amount is greater than the input audio's tokens. You're charged for the speech to text transcription and the text translation for each target language.
72
+
73
+
For example, let's say that you want text translations from a one-hour audio file to three target languages. If the initial speech to text transcription contains 10,000 characters, you might be charged $2.80.
74
+
75
+
> [!WARNING]
76
+
> The prices in this example are for illustrative purposes only. Please refer to the [Azure AI Speech pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) and [Azure AI Translator pricing](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
77
+
78
+
The previous example price of $2.80 was calculated by combining the speech to text transcription and the text translation costs. Here's how the calculation was done:
79
+
- The speech to text list price is $2.50 per hour. See **Pay as You Go** > **Speech translation** > **Standard** in the [Azure AI Speech pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) for the most up-to-date pricing information.
80
+
- The cost for the third language translation is 30 cents in this example. The translation list price is $10 per million characters. Since the audio file contains 10,000 characters, the translation cost is $10 * 10,000 / 1,000,000 * 3 = $0.3. The number "3" in this equation represents a weighting coefficient of intermediate traffic, which might vary depending on the languages involved. See **Pay as You Go** > **Standard translation** > **Text translation** in the [Azure AI Translator pricing table](https://azure.microsoft.com/pricing/details/cognitive-services/translator/) for the most up-to-date pricing information.
28
81
29
82
## Get started
30
83
31
-
As your first step, try the [Speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
84
+
As your first step, try the [speech translation quickstart](get-started-speech-translation.md). The speech translation service is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
32
85
33
86
You find [Speech SDK speech to text and translation samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk) on GitHub. These samples cover common scenarios, such as reading audio from a file or stream, continuous and single-shot recognition and translation, and working with custom models.
0 commit comments