Skip to content

Commit 702aad2

Browse files
Merge pull request #281710 from sally-baolian/patch-273
TTS Avatar GA release on July 25th in Pacific time
2 parents 78435c7 + ddc370e commit 702aad2

8 files changed

+22
-38
lines changed

articles/ai-services/speech-service/text-to-speech-avatar/avatar-gestures-with-ssml.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,7 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# Customize text to speech avatar gestures with SSML (preview)
15-
16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14+
# Customize text to speech avatar gestures with SSML
1715

1816
The [Speech Synthesis Markup Language (SSML)](../speech-synthesis-markup-structure.md) with input text determines the structure, content, and other characteristics of the text to speech output. Most SSML tags can also work in text to speech avatar. Furthermore, text to speech avatar batch mode provides avatar gestures insertion ability by using the SSML bookmark element with the format `<bookmark mark='gesture.*'/>`.
1917

articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,7 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# Batch synthesis properties for text to speech avatar (preview)
15-
16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14+
# Batch synthesis properties for text to speech avatar
1715

1816
Batch synthesis properties can be grouped as: avatar related properties, batch job related properties, and text to speech related properties, which are described in the following tables.
1917

articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar.md

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,9 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# How to use batch synthesis for text to speech avatar (preview)
14+
# How to use batch synthesis for text to speech avatar
1515

16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
17-
18-
The batch synthesis API for text to speech avatar (preview) allows for the asynchronous synthesis of text into a talking avatar as a video file. Publishers and video content platforms can utilize this API to create avatar video content in a batch. That approach can be suitable for various use cases such as training materials, presentations, or advertisements.
16+
The batch synthesis API for text to speech avatar allows for the asynchronous synthesis of text into a talking avatar as a video file. Publishers and video content platforms can utilize this API to create avatar video content in a batch. That approach can be suitable for various use cases such as training materials, presentations, or advertisements.
1917

2018
The synthetic avatar video will be generated asynchronously after the system receives text input. The generated video output can be downloaded in batch mode synthesis. You submit text for synthesis, poll for the synthesis status, and download the video output when the status indicates success. The text input formats must be plain text or Speech Synthesis Markup Language (SSML) text.
2119

@@ -27,10 +25,10 @@ To perform batch synthesis, you can use the following REST API operations.
2725

2826
| Operation | Method | REST API call |
2927
|----------------------|---------|---------------------------------------------------|
30-
| [Create batch synthesis](#create-a-batch-synthesis-request) | PUT | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-15-preview |
31-
| [Get batch synthesis](#get-batch-synthesis) | GET | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-15-preview |
32-
| [List batch synthesis](#list-batch-synthesis) | GET | avatar/batchsyntheses/?api-version=2024-04-15-preview |
33-
| [Delete batch synthesis](#delete-batch-synthesis) | DELETE | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-15-preview |
28+
| [Create batch synthesis](#create-a-batch-synthesis-request) | PUT | avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01 |
29+
| [Get batch synthesis](#get-batch-synthesis) | GET | avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01 |
30+
| [List batch synthesis](#list-batch-synthesis) | GET | avatar/batchsyntheses/?api-version=2024-08-01 |
31+
| [Delete batch synthesis](#delete-batch-synthesis) | DELETE | avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01 |
3432

3533
You can refer to the code samples on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch-avatar).
3634

@@ -67,7 +65,7 @@ curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourSpeechKey" -H "Content-Type: a
6765
"talkingAvatarCharacter": "lisa",
6866
"talkingAvatarStyle": "graceful-sitting"
6967
}
70-
}' "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/my-job-01?api-version=2024-04-15-preview"
68+
}' "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/my-job-01?api-version=2024-08-01"
7169
```
7270

7371
You should receive a response body in the following format:
@@ -106,7 +104,7 @@ To retrieve the status of a batch synthesis job, make an HTTP GET request using
106104
Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
107105

108106
```azurecli-interactive
109-
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-04-15-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
107+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
110108
```
111109

112110
You should receive a response body in the following format:
@@ -157,7 +155,7 @@ To list all batch synthesis jobs for your Speech resource, make an HTTP GET requ
157155
Replace `YourSpeechKey` with your Speech resource key and `YourSpeechRegion` with your Speech resource region. Optionally, you can set the `skip` and `top` (page size) query parameters in the URL. The default value for `skip` is 0, and the default value for `maxpagesize` is 100.
158156

159157
```azurecli-interactive
160-
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-04-15-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
158+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
161159
```
162160

163161
You receive a response body in the following format:
@@ -232,7 +230,7 @@ You receive a response body in the following format:
232230
}
233231
}
234232
],
235-
"nextLink": "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/?api-version=2024-04-15-preview&skip=2&maxpagesize=2"
233+
"nextLink": "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/?api-version=2024-08-01&skip=2&maxpagesize=2"
236234
}
237235
```
238236

@@ -283,7 +281,7 @@ After you have retrieved the audio output results and no longer need the batch s
283281
To delete a batch synthesis job, make an HTTP DELETE request using the following URI format. Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
284282

285283
```azurecli-interactive
286-
curl -v -X DELETE "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-04-15-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
284+
curl -v -X DELETE "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
287285
```
288286

289287
The response headers include `HTTP/1.1 204 No Content` if the delete request was successful.

articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,7 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# How to create a custom text to speech avatar (preview)
15-
16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14+
# How to create a custom text to speech avatar
1715

1816
Getting started with a custom text to speech avatar is a straightforward process. All it takes are a few of video files. If you'd like to train a [custom neural voice](../custom-neural-voice.md) for the same actor, you can do so separately.
1917

articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,7 @@ ms.author: v-baolianzou
1111
keywords: how to record video samples for custom text to speech avatar
1212
---
1313

14-
# How to record video samples for custom text to speech avatar (preview)
15-
16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14+
# How to record video samples for custom text to speech avatar
1715

1816
This article provides instructions on preparing high-quality video samples for creating a custom text to speech avatar.
1917

articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Real-time synthesis for text to speech avatar (preview) - Speech service
2+
title: Real-time synthesis for text to speech avatar - Speech service
33
titleSuffix: Azure AI services
44
description: Learn how to use text to speech avatar with real-time synthesis.
55
manager: nitinme
@@ -11,11 +11,9 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# How to do real-time synthesis for text to speech avatar (preview)
14+
# How to do real-time synthesis for text to speech avatar
1515

16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
17-
18-
In this how-to guide, you learn how to use text to speech avatar (preview) with real-time synthesis. The synthetic avatar video will be generated in almost real time after the system receives the text input.
16+
In this how-to guide, you learn how to use text to speech avatar with real-time synthesis. The synthetic avatar video will be generated in almost real time after the system receives the text input.
1917

2018
## Prerequisites
2119

articles/ai-services/speech-service/text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,7 @@ ms.author: eur
1111
author: eric-urban
1212
---
1313

14-
# What is custom text to speech avatar? (preview)
15-
16-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
14+
# What is custom text to speech avatar?
1715

1816
Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. If you also create a [custom neural voice](#custom-voice-and-custom-text-to-speech-avatar) for the same actor and use it as the avatar's voice, the avatar will be even more realistic.
1917

@@ -41,7 +39,7 @@ Here's an overview of the steps to create a custom text to speech avatar:
4139

4240
1. **Prepare training data:** Ensure that the video recording is in the right format. It's a good idea to shoot the video recording in a professional-quality video shooting studio to get a clean background image. The quality of the resulting avatar heavily depends on the recorded video used for training. Factors like speaking rate, body posture, facial expression, hand gestures, consistency in the actor's position, and lighting of the video recording are essential to create an engaging custom text to speech avatar.
4341

44-
1. **Train the avatar model:** We'll start training the custom text to speech model after verifying the consent statement of the avatar talent. In the preview stage of this service, this step will be done manually by Microsoft. You'll be notified after the model is successfully trained.
42+
1. **Train the avatar model:** We'll start training the custom text to speech model after verifying the consent statement of the avatar talent. This step is currently manually done by Microsoft. You'll be notified after the model is successfully trained.
4543

4644
1. **Deploy and use your avatar model in your APPs**
4745

articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,7 @@ author: eric-urban
1212
ms.custom: references_regions
1313
---
1414

15-
# Text to speech avatar overview (preview)
16-
17-
[!INCLUDE [Text to speech avatar preview](../includes/text-to-speech-avatar-preview.md)]
15+
# Text to speech avatar overview
1816

1917
Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a [custom text to speech avatar](#custom-text-to-speech-avatar)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
2018

@@ -44,7 +42,7 @@ The voice in the synthetic video could be a prebuilt neural voice available on A
4442

4543
Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264 or h265 if the format is mp4 and can set codec as vp9 if the format is `webm`; only `webm` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
4644

47-
| | Batch synthesis | Real-Time synthesis |
45+
| | Batch synthesis | Real-time synthesis |
4846
|------------------|------------------|----------------------|
4947
| **Resolution** | 1920 x 1080 | 1920 x 1080 |
5048
| **FPS** | 25 | 25 |

0 commit comments

Comments
 (0)