You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/avatar-gestures-with-ssml.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,7 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# Customize text to speech avatar gestures with SSML (preview)
15
-
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14
+
# Customize text to speech avatar gestures with SSML
17
15
18
16
The [Speech Synthesis Markup Language (SSML)](../speech-synthesis-markup-structure.md) with input text determines the structure, content, and other characteristics of the text to speech output. Most SSML tags can also work in text to speech avatar. Furthermore, text to speech avatar batch mode provides avatar gestures insertion ability by using the SSML bookmark element with the format `<bookmark mark='gesture.*'/>`.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,7 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# Batch synthesis properties for text to speech avatar (preview)
15
-
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14
+
# Batch synthesis properties for text to speech avatar
17
15
18
16
Batch synthesis properties can be grouped as: avatar related properties, batch job related properties, and text to speech related properties, which are described in the following tables.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar.md
+11-13Lines changed: 11 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,11 +11,9 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# How to use batch synthesis for text to speech avatar (preview)
14
+
# How to use batch synthesis for text to speech avatar
15
15
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
17
-
18
-
The batch synthesis API for text to speech avatar (preview) allows for the asynchronous synthesis of text into a talking avatar as a video file. Publishers and video content platforms can utilize this API to create avatar video content in a batch. That approach can be suitable for various use cases such as training materials, presentations, or advertisements.
16
+
The batch synthesis API for text to speech avatar allows for the asynchronous synthesis of text into a talking avatar as a video file. Publishers and video content platforms can utilize this API to create avatar video content in a batch. That approach can be suitable for various use cases such as training materials, presentations, or advertisements.
19
17
20
18
The synthetic avatar video will be generated asynchronously after the system receives text input. The generated video output can be downloaded in batch mode synthesis. You submit text for synthesis, poll for the synthesis status, and download the video output when the status indicates success. The text input formats must be plain text or Speech Synthesis Markup Language (SSML) text.
21
19
@@ -27,10 +25,10 @@ To perform batch synthesis, you can use the following REST API operations.
You should receive a response body in the following format:
@@ -106,7 +104,7 @@ To retrieve the status of a batch synthesis job, make an HTTP GET request using
106
104
Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
107
105
108
106
```azurecli-interactive
109
-
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-04-15-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
107
+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
110
108
```
111
109
112
110
You should receive a response body in the following format:
@@ -157,7 +155,7 @@ To list all batch synthesis jobs for your Speech resource, make an HTTP GET requ
157
155
Replace `YourSpeechKey` with your Speech resource key and `YourSpeechRegion` with your Speech resource region. Optionally, you can set the `skip` and `top` (page size) query parameters in the URL. The default value for `skip` is 0, and the default value for `maxpagesize` is 100.
158
156
159
157
```azurecli-interactive
160
-
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-04-15-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
158
+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
161
159
```
162
160
163
161
You receive a response body in the following format:
@@ -232,7 +230,7 @@ You receive a response body in the following format:
@@ -283,7 +281,7 @@ After you have retrieved the audio output results and no longer need the batch s
283
281
To delete a batch synthesis job, make an HTTP DELETE request using the following URI format. Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,7 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# How to create a custom text to speech avatar (preview)
15
-
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14
+
# How to create a custom text to speech avatar
17
15
18
16
Getting started with a custom text to speech avatar is a straightforward process. All it takes are a few of video files. If you'd like to train a [custom neural voice](../custom-neural-voice.md) for the same actor, you can do so separately.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Real-time synthesis for text to speech avatar (preview) - Speech service
2
+
title: Real-time synthesis for text to speech avatar - Speech service
3
3
titleSuffix: Azure AI services
4
4
description: Learn how to use text to speech avatar with real-time synthesis.
5
5
manager: nitinme
@@ -11,11 +11,9 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# How to do real-time synthesis for text to speech avatar (preview)
14
+
# How to do real-time synthesis for text to speech avatar
15
15
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
17
-
18
-
In this how-to guide, you learn how to use text to speech avatar (preview) with real-time synthesis. The synthetic avatar video will be generated in almost real time after the system receives the text input.
16
+
In this how-to guide, you learn how to use text to speech avatar with real-time synthesis. The synthetic avatar video will be generated in almost real time after the system receives the text input.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,9 +11,7 @@ ms.author: eur
11
11
author: eric-urban
12
12
---
13
13
14
-
# What is custom text to speech avatar? (preview)
15
-
16
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14
+
# What is custom text to speech avatar?
17
15
18
16
Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. If you also create a [custom neural voice](#custom-voice-and-custom-text-to-speech-avatar) for the same actor and use it as the avatar's voice, the avatar will be even more realistic.
19
17
@@ -41,7 +39,7 @@ Here's an overview of the steps to create a custom text to speech avatar:
41
39
42
40
1.**Prepare training data:** Ensure that the video recording is in the right format. It's a good idea to shoot the video recording in a professional-quality video shooting studio to get a clean background image. The quality of the resulting avatar heavily depends on the recorded video used for training. Factors like speaking rate, body posture, facial expression, hand gestures, consistency in the actor's position, and lighting of the video recording are essential to create an engaging custom text to speech avatar.
43
41
44
-
1.**Train the avatar model:** We'll start training the custom text to speech model after verifying the consent statement of the avatar talent. In the preview stage of this service, this step will be done manually by Microsoft. You'll be notified after the model is successfully trained.
42
+
1.**Train the avatar model:** We'll start training the custom text to speech model after verifying the consent statement of the avatar talent. This step is currently manually done by Microsoft. You'll be notified after the model is successfully trained.
45
43
46
44
1.**Deploy and use your avatar model in your APPs**
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,7 @@ author: eric-urban
12
12
ms.custom: references_regions
13
13
---
14
14
15
-
# Text to speech avatar overview (preview)
16
-
17
-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
15
+
# Text to speech avatar overview
18
16
19
17
Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](#custom-text-to-speech-avatar)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
20
18
@@ -44,7 +42,7 @@ The voice in the synthetic video could be a prebuilt neural voice available on A
44
42
45
43
Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264 or h265 if the format is mp4 and can set codec as vp9 if the format is `webm`; only `webm` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
0 commit comments