Skip to content

Commit 899aea9

Browse files
committed
rm more preview tags
1 parent 8e36f85 commit 899aea9

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/ai-foundry/responsible-ai/speech-service/text-to-speech/transparency-note.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,13 @@ In addition to the common terms from prebuilt neural voice, custom neural voice,
9999
|---|---|
100100
| Avatar talent | Custom text to speech avatar model building requires training on a video recording of a real human speaking. This person is the avatar talent. Customers must get sufficient consent under all relevant laws and regulations from the avatar talent to use their image/likeness to create a custom avatar. |
101101

102-
#### [Video translation (preview)](#tab/video)
102+
#### [Video translation](#tab/video)
103103

104104
### Introduction
105105

106106
Video translation can efficiently localize your video content to cater to diverse audiences around the globe. This service empowers you to create immersive, localized content efficiently and effectively across various use cases such as vlogs, education, news, advertising, and more.
107107

108-
Video translation using prebuilt neural voices is available in preview for all users. Video translation with personal voice is a Limited Access feature in preview and is subject to use case and eligibility restrictions.
108+
Video translation using prebuilt neural voices is available for all users.
109109

110110
### Key terms
111111

@@ -169,7 +169,7 @@ Text to speech avatar adopts Coalition for Content Provenance and Authenticity (
169169
In addition, avatar outputs are automatically watermarked. Watermarks allow approved users to identify whether a video is synthesized using the avatar feature of Azure AI Speech. To request watermark detection, please contact [avatarvoice[at]microsoft.com](mailto:[email protected]).
170170

171171

172-
### Video translation (preview)
172+
### Video translation
173173

174174
Video translation can efficiently localize your video content to cater to diverse audiences around the globe. Video translation will automatically extract dialogue audio, transcribe, translate and dub the content with prebuilt or personal voice to the target language, with accurate subtitles for better accessibility. Multi-speaker features will help identify the number of individuals speaking and recommend suitable voices. Content editing with human in the loop allows for precise alignment with customer preference. Enhanced translation quality ensures precise audio and video alignment with GPT integration. Video translation enables authentic and personalized dubbing experiences with personal voice.
175175

@@ -209,7 +209,7 @@ All other uses of custom neural voice, including Custom Neural Voice Pro, Custom
209209

210210
Prebuilt neural voice may also be used for the custom neural voice use cases above, as well as additional use cases selected by customers and consistent with the Azure Acceptable Use Policy and the [Code of conduct for Azure AI Speech text to speech](/legal/ai-code-of-conduct?context=%2Fazure%2Fai-services%2Fspeech-service%2Fcontext%2Fcontext). No registration or pre-approval is required for additional use cases for prebuilt neural voice that meet all applicable terms and conditions.
211211

212-
### Intended use cases for video translation (preview)
212+
### Intended use cases for video translation
213213

214214
Video translation could be used for films, TV, and other visual (including but not limited to video or animation) and audio applications, where customers maintain sole control over the creation of, access to, and use of the voice models and their output. Personal voice and lip syncing are subject to the Limited Access framework, and eligible customers may use these capabilities with Video translation. The following are the approved use cases for Video translation service:
215215
- **Education & learning**: To translate audio in educational visuals, online courses, training modules, simulation-based learning, or guided museum tour visuals for multilingual learners. 
@@ -301,7 +301,7 @@ Technical limitations to consider are the accuracy of lip sync alignment with th
301301
- **Gestures**: Avatars may use hand gestures during speaking to deliver a natural speaking experience, but the gestures are not pre-programmed. Instead, they are learned from video clips in the training data and are included in synthetic video regardless of the input text. Also, avatars cannot make gestures that were not made by the avatar talent and captured in the training data. Avatars are not able to tailor gestures according to contextual information and emotions, so customers should be mindful of the avatar system’s inability to automatically play a gesture appropriate for the context.
302302
- **Privacy and data protection**: When utilizing text to speech avatars, customers should adhere to all applicable privacy laws and regulations and ensure that sensitive or personal information is handled securely. It is important to be cautious when processing and storing data, and to follow best practices for data protection and consent management.
303303

304-
#### [Video translation (preview)](#tab/video)
304+
#### [Video translation](#tab/video)
305305

306306
* **Translation quality**: Translation quality will depend on the transcription accuracy and translation accuracy. If the input video is mixed with background music or noise, this will impact the quality of the translation. Translation results will be dependent on context.
307307
* **Dubbing voice similarity and intonation**: When you choose prebuilt neural voices for dubbing, the voice output characteristics may not be similar to the original voice characteristics. If you use the personal voice feature, the voice output will more closely resemble the original voice, but the speaking style may not closely resemble the user’s speaking style including tones and prosodies. It’s also possible the voice output will not sound equally natural across all supported languages.
@@ -395,7 +395,7 @@ The quality of the resulting avatar heavily depends on the recorded video used f
395395

396396
The appearance and performance of the avatar talent are also key factors impacting the system performance; please see our guidance [How to record video samples for custom text to speech avatar](/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples).
397397

398-
#### [Video translation (preview)](#tab/video)
398+
#### [Video translation](#tab/video)
399399
---
400400

401401
## Evaluation of text to speech

0 commit comments

Comments
 (0)