Skip to content

Commit 9798759

Browse files
authored
Merge pull request #275758 from eric-urban/eur/baolian-patch-246
video translation for AI Speech
2 parents 4813b1a + b5a8a21 commit 9798759

File tree

9 files changed

+183
-2
lines changed

9 files changed

+183
-2
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
author: sally-baolian
3+
ms.service: azure-ai-speech
4+
ms.date: 05/17/2024
5+
ms.topic: include
6+
ms.author: v-baolianzou
7+
---
8+
9+
| Source language | Source locale | Target language | Target locale |
10+
|---------------|-----------------|------------------|----------------------|
11+
| Chinese (Mandarin, Simplified) |`zh-CN` | English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) | `en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`|
12+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`| Chinese (Mandarin, Simplified)|`zh-CN` |
13+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`| German (Austria)<br/>German (Switzerland)<br/>German (Germany) |`de-AT`<br/>`de-CH`<br/>`de-DE` |
14+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`| Hindi (India) |`hi-IN` |
15+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`| Italian (Switzerland)<br/>Italian (Italy) |`it-CH`<br/>`it-IT`|
16+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`| Russian (Russia) |`ru-RU`|
17+
| English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) |`en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`|Spanish (Argentina)<br/>Spanish (Bolivia)<br/>Spanish (Chile)<br/>Spanish (Colombia)<br/>Spanish (Costa Rica)<br/>Spanish (Cuba)<br/>Spanish (Dominican Republic)<br/>Spanish (Ecuador)<br/>Spanish (Spain)<br/>Spanish (Equatorial Guinea)<br/>Spanish (Guatemala)<br/>Spanish (Honduras)<br/>Spanish (Mexico)<br/>Spanish (Nicaragua)<br/>Spanish (Panama)<br/>Spanish (Peru)<br/>Spanish (Puerto Rico)<br/>Spanish (Paraguay)<br/>Spanish (El Salvador)<br/>Spanish (United States)<br/>Spanish (Uruguay)<br/>Spanish (Venezuela)|`es-AR`<br/>`es-BO`<br/>`es-CL`<br/>`es-CO`<br/>`es-CR`<br/>`es-CU`<br/>`es-DO`<br/>`es-EC`<br/>`es-ES`<br/>`es-GQ`<br/>`es-GT`<br/>`es-HN`<br/>`es-MX`<br/>`es-NI`<br/>`es-PA`<br/>`es-PE`<br/>`es-PR`<br/>`es-PY`<br/>`es-SV`<br/>`es-US`<br/>`es-UY`<br/>`es-VE` |
18+
| Hindi (India) |`hi-IN` | English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) | `en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`|
19+
| Korean (Korea) |`ko-KR` | English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) | `en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`|
20+
| Spanish (Argentina)<br/>Spanish (Bolivia)<br/>Spanish (Chile)<br/>Spanish (Colombia)<br/>Spanish (Costa Rica)<br/>Spanish (Cuba)<br/>Spanish (Dominican Republic)<br/>Spanish (Ecuador)<br/>Spanish (Spain)<br/>Spanish (Equatorial Guinea)<br/>Spanish (Guatemala)<br/>Spanish (Honduras)<br/>Spanish (Mexico)<br/>Spanish (Nicaragua)<br/>Spanish (Panama)<br/>Spanish (Peru)<br/>Spanish (Puerto Rico)<br/>Spanish (Paraguay)<br/>Spanish (El Salvador)<br/>Spanish (United States)<br/>Spanish (Uruguay)<br/>Spanish (Venezuela)|`es-AR`<br/>`es-BO`<br/>`es-CL`<br/>`es-CO`<br/>`es-CR`<br/>`es-CU`<br/>`es-DO`<br/>`es-EC`<br/>`es-ES`<br/>`es-GQ`<br/>`es-GT`<br/>`es-HN`<br/>`es-MX`<br/>`es-NI`<br/>`es-PA`<br/>`es-PE`<br/>`es-PR`<br/>`es-PY`<br/>`es-SV`<br/>`es-US`<br/>`es-UY`<br/>`es-VE` | English (Australia)<br/>English (Canada)<br/>English (United Kingdom)<br/>English (Ghana)<br/>English (Hong Kong SAR)<br/>English (Ireland)<br/>English (India)<br/>English (Kenya)<br/>English (Nigeria)<br/>English (New Zealand)<br/>English (Philippines)<br/>English (Singapore)<br/>English (Tanzania)<br/>English (United States)<br/>English (South Africa) | `en-AU`<br/>`en-CA`<br/>`en-GB`<br/>`en-GH`<br/>`en-HK`<br/>`en-IE`<br/>`en-IN`<br/>`en-KE`<br/>`en-NG`<br/>`en-NZ`<br/>`en-PH`<br/>`en-SG`<br/>`en-TZ`<br/>`en-US`<br/>`en-ZA`|

articles/ai-services/speech-service/language-support.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ This table lists all the locales supported for [Viseme](speech-synthesis-markup-
8888
Each prebuilt neural voice supports a specific language and dialect, identified by locale. You can try the demo and hear the voices in the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery).
8989

9090
> [!IMPORTANT]
91-
> Pricing varies for Prebuilt Neural Voice (see *Neural* on the pricing page) and custom neural voice (see *Custom Neural* on the pricing page). For more information, see the [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) page.
91+
> Pricing varies for Prebuilt Neural Voice (see *Neural* on the pricing page) and custom neural voice (see *Custom Neural* on the pricing page). For more information, see the [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) page.
9292
9393
Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz. Other sample rates can be obtained through upsampling or downsampling when synthesizing.
9494

@@ -138,6 +138,12 @@ To set the translation target language, with few exceptions you only specify the
138138

139139
[!INCLUDE [Language support include](includes/language-support/speech-translation.md)]
140140

141+
### Video translation
142+
143+
The following table illustrates the fixed mapping relationship between source and target locales, along with the full locales associated with each language.
144+
145+
[!INCLUDE [Language support include](includes/language-support/video-translation.md)]
146+
141147
# [Language identification](#tab/language-identification)
142148

143149
The table in this section summarizes the locales supported for [Language identification](language-identification.md).
11.6 KB
Loading
52.4 KB
Loading
39.2 KB
Loading
27.1 KB
Loading

articles/ai-services/speech-service/toc.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ items:
227227
- name: Deploy your custom text to speech avatar model as an endpoint
228228
href: text-to-speech-avatar/custom-avatar-endpoint.md
229229
displayName: avatar
230-
- name: Audio Content Creation
230+
- name: Audio content creation
231231
href: how-to-audio-content-creation.md
232232
displayName: acc
233233
- name: OpenAI text to speech voices
@@ -246,6 +246,12 @@ items:
246246
href: get-started-speech-translation.md
247247
- name: How to recognize and translate speech
248248
href: how-to-translate-speech.md
249+
- name: Video translation (preview)
250+
items:
251+
- name: Video translation overview
252+
href: video-translation-overview.md
253+
- name: Video translation in the studio
254+
href: video-translation-studio.md
249255
- name: Intent recognition
250256
items:
251257
- name: Intent recognition overview
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
title: Video translation overview - Speech service
3+
titleSuffix: Azure AI services
4+
description: With video translation, you can seamlessly integrate multi-language voice-over capabilities into your videos.
5+
manager: nitinme
6+
ms.service: azure-ai-speech
7+
ms.topic: overview
8+
ms.date: 5/21/2024
9+
ms.reviewer: sally-baolian
10+
ms.author: eur
11+
author: eric-urban
12+
ms.custom: references_regions
13+
---
14+
15+
# What is video translation (preview)
16+
17+
[!INCLUDE [Feature preview](../includes/preview-feature.md)]
18+
19+
Video translation is a feature in Azure AI Speech that enables you to seamlessly translate and generate videos in multiple languages automatically. This feature is designed to help you localize your video content to cater to diverse audiences around the globe. You can efficiently create immersive, localized videos across various use cases such as vlogs, education, news, enterprise training, advertising, film, TV shows, and more.
20+
21+
The process of replacing the original language of a video with audio recorded in a different language is often relied upon to cater to diverse audiences. Traditionally achieved through human recording and manual post-production, dubbing is essential for ensuring that viewers can enjoy video content in their native language. However, this process comes with key pain points, including its high cost, lengthy duration, and inability to replicate the original speaker's voice accurately. Video translation in Azure AI Speech addresses these challenges by providing an automated, efficient, and cost-effective solution for creating localized videos.
22+
23+
## Use case
24+
25+
Video translation provided by Azure AI Speech has a wide range of use cases across various industries and content types. Here are some key applications:
26+
27+
- **News + interviews**: News organizations can translate and dub news segments and interviews to provide accurate and timely information to audiences worldwide.
28+
29+
- **Advertisement + marketing**: Businesses can localize their advertising and marketing videos to resonate with target audiences in different markets, enhancing brand awareness and customer engagement.
30+
31+
- **Education + learning**: Educational institutions and e-learning platforms can dub their instructional videos and lectures into different languages, making learning more accessible and inclusive.
32+
33+
- **Film + TV show**: Film studios and production companies can dub their movies and TV shows for international distribution, reaching a broader audience and maximizing revenue potential.
34+
35+
- **Vlog + short video**: Content owners can easily translate and dub their vlogs and short videos to reach international audiences, expanding their viewership and engagement.
36+
37+
- **Enterprise training**: Corporations can localize their training videos for employees in different regions, ensuring consistent and effective communication across their workforce.
38+
39+
## Supported regions and languages
40+
41+
Currently, video translation in Azure AI Speech is only supported in the East US region.
42+
43+
We support video translation between various languages, enabling you to tailor your content to specific linguistic preferences. For the languages supported for video translation, refer to the [supported source and target languages](language-support.md?tabs=speech-translation#video-translation).
44+
45+
## Core features
46+
47+
- **Dialogue audio extraction and spoken content transcription.**
48+
49+
Automatically extracts dialogue audio from the source video and transcribe the spoken content.
50+
- **Translation from language A to B and large language model (LLM) reformulation.**
51+
52+
Translates the transcribed content from the original language (Language A) to the target language (Language B) using advanced language processing techniques. Enhances translation quality and refines gender-aware translated text through LLM reformulation.
53+
- **Automatic dubbing – voice generation in other language.**
54+
55+
Utilizes AI-powered text-to-speech technology to automatically generate human-like voices in the target language. These voices are precisely synchronized with the video, ensuring a flawless dubbing experience. This includes utilizing prebuilt neural voices for high-quality output and offering options for personal voice.
56+
- **Human in the loop for content editing.**
57+
58+
Allows for human intervention to review and edit the translated content, ensuring accuracy and cultural appropriateness before finalizing the dubbed video.
59+
- **Subtitles generation.**
60+
61+
Delivers the fully dubbed video with translated dialogue, synchronized subtitles, and generated voices, ready for download and distribution across various platforms. You can also set the subtitle length on each screen for optimal display.
62+
63+
## Get started
64+
65+
To get started with video translation, refer to [video translation in the studio](video-translation-studio.md). The video translation API will be available soon.
66+
67+
## Price
68+
69+
Pricing details for video translation will be effective from June 2024.
70+
71+
## Related content
72+
73+
* Try the [video translation in the studio](video-translation-studio.md)
74+
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: How to use video translation in the studio
3+
titleSuffix: Azure AI services
4+
description: With Azure AI Speech video translation, you can seamlessly translate and generate videos in multiple languages automatically.
5+
manager: nitinme
6+
ms.service: azure-ai-speech
7+
ms.topic: how-to
8+
ms.date: 5/21/2024
9+
ms.reviewer: sally-baolian
10+
ms.author: eur
11+
author: eric-urban
12+
ms.custom: references_regions
13+
---
14+
15+
# Video translation in the studio
16+
17+
[!INCLUDE [Feature preview](../includes/preview-feature.md)]
18+
19+
In this article, you learn how to use Azure AI Speech video translation in the studio.
20+
21+
All it takes to get started is an original video. See if video translation supports your [language](language-support.md?tabs=speech-translation#video-translation) and [region](video-translation-overview.md#supported-regions-and-languages).
22+
23+
## Create a video translation project
24+
25+
To create a video translation project, follow these steps:
26+
27+
1. Sign in to the [Speech Studio](https://aka.ms/speechstudio).
28+
29+
1. Select the subscription and Speech resource to work with.
30+
31+
1. Select **Video translation**.
32+
33+
1. On the **Create and Manage Projects** page, select **Upload file**.
34+
35+
1. On the **Video file** page, upload your video file by dragging and dropping the video file or selecting the file manually.
36+
37+
Ensure the video is in .mp4 format, less than 500 MB, and shorter than 60 minutes.
38+
39+
1. Provide the **File name**, **Description**, and select **Voice type**, **Language of the video**, **Translate to** language.
40+
41+
You can select **Prebuilt neural voice** or **Personal voice** for **Voice type**. For prebuilt neural voice, the system automatically selects the most suitable prebuilt voice by matching the speaker's voice in the video with prebuilt voices. For personal voice, the system provides the model with superior voice cloning similarity. To use personal voice, you need to apply for access. The application form will be available soon.
42+
43+
:::image type="content" source="media/video-translation/upload-video-file.png" alt-text="Screenshot of uploading your video file on the video file page.":::
44+
45+
1. After reviewing the pricing information and code of conduct, then proceed to create the project.
46+
47+
When processing the video file, you can check the processing status on the project tab.
48+
49+
Once the upload is complete, the project is created. You can then select the project to review detailed settings and make adjustments according to your preferences.
50+
51+
## Check and adjust voice settings
52+
53+
On the project details page, the project offers two tabs **Translated** and **Original** under **Video**, allowing you to compare them side by side.
54+
55+
On the right side of the video, you can view both the original script and the translated script. Hovering over each part of the original script triggers the video to automatically jump to the corresponding segment of the original video, while hovering over each part of the translated script triggers the video to jump to the corresponding translated segment.
56+
57+
You can also add or remove segments as needed. When you want to add a segment, ensure that the new segment timestamp doesn't overlap with the previous and next segment, and the segment end time should be larger than the start time. The correct format of timestamp should be `hh:mm:ss.ms`. Otherwise, you can't apply the changes.
58+
59+
If you encounter segments with an "unidentified" voice name, it might be because the system couldn't accurately detect the voice, especially in situations where speaker voices overlap. In such cases, it's advisable to manually change the voice name.
60+
61+
:::image type="content" source="media/video-translation/voice-unidentified.png" alt-text="Screenshot of one segment with unidentified voice name.":::
62+
63+
If you want to adjust the voice, select **Voice settings** to make some changes. On the **Voice settings** page, you can adjust the voice type, gender, and the voice. Select the voice sample on the right of **Voice** to determine your voice selection. If you find there is missing voice, you can add the new voice name by selecting **Add speaker**. After changing the settings, select **Update**.
64+
65+
:::image type="content" source="media/video-translation/voice-settings.png" alt-text="Screenshot of adjusting voice settings on the voice settings page.":::
66+
67+
If you make changes multiple times but haven't finished, you only need to save the changes you've made by selecting **Save**. After making all changes, select **Apply changes** to apply them to the video. You'll be charged only after you select **Apply changes**.
68+
69+
:::image type="content" source="media/video-translation/apply-changes.png" alt-text="Screenshot of selecting apply changes button after making all changes.":::
70+
71+
You can translate the original video into a new language by selecting **New language**. On the **Translate** page, you can choose a new translated language and voice type. Once the video file has been translated, a new project is automatically created.
72+
73+
## Related content
74+
75+
- [Video translation overview](video-translation-overview.md)

0 commit comments

Comments
 (0)