Skip to content

Commit 2fba62a

Browse files
author
Yulin Li
committed
[avatar] add new supported av1 codec
1 parent d1269aa commit 2fba62a

File tree

2 files changed

+17
-17
lines changed

2 files changed

+17
-17
lines changed

articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Batch synthesis properties - Speech service
33
titleSuffix: Azure AI services
4-
description: Learn about the batch synthesis properties that are available for text to speech avatar.
4+
description: Learn about the batch synthesis properties that are available for text to speech avatar.
55
manager: nitinme
66
ms.service: azure-ai-speech
77
ms.topic: how-to
@@ -15,7 +15,7 @@ author: eric-urban
1515

1616
Batch synthesis properties can be grouped as: avatar related properties, batch job related properties, and text to speech related properties, which are described in the following tables.
1717

18-
Some properties in JSON format are required when you create a new batch synthesis job. Other properties are optional. The batch synthesis response includes other properties to provide information about the synthesis status and results. For example, the `outputs.result` property contains the location from where you can download a video file containing the avatar video. From `outputs.summary`, you can access the summary and debug details.
18+
Some properties in JSON format are required when you create a new batch synthesis job. Other properties are optional. The batch synthesis response includes other properties to provide information about the synthesis status and results. For example, the `outputs.result` property contains the location from where you can download a video file containing the avatar video. From `outputs.summary`, you can access the summary and debug details.
1919

2020
## Avatar properties
2121

@@ -27,7 +27,7 @@ The following table describes the avatar properties.
2727
| avatarConfig.talkingAvatarStyle | The style name of the talking avatar.<br/><br/>The supported avatar styles can be found [here](avatar-gestures-with-ssml.md#supported-prebuilt-avatar-characters-styles-and-gestures).<br/><br/>This property is required for prebuilt avatar, and optional for customized avatar.|
2828
| avatarConfig.customized | A bool value indicating whether the avatar to be used is customized avatar or not. True for customized avatar, and false for prebuilt avatar.<br/><br/>This property is optional, and the default value is `false`.|
2929
| avatarConfig.videoFormat | The format for output video file, could be mp4 or webm.<br/><br/>The `webm` format is required for transparent background.<br/><br/>This property is optional, and the default value is mp4.|
30-
| avatarConfig.videoCodec | The codec for output video, could be h264, hevc or vp9.<br/><br/>Vp9 is required for transparent background. The synthesis speed will be slower with vp9 codec, as vp9 encoding is slower.<br/><br/>This property is optional, and the default value is hevc.|
30+
| avatarConfig.videoCodec | The codec for output video, could be h264, hevc, vp9 or av1.<br/><br/>Vp9 is required for transparent background. The synthesis speed will be slower with vp9 codec, as vp9 encoding is slower.<br/><br/>This property is optional, and the default value is hevc.|
3131
| avatarConfig.bitrateKbps | The bitrate for output video, which is integer value, with unit kbps.<br/><br/>This property is optional, and the default value is 2000.|
3232
| avatarConfig.videoCrop | This property allows you to crop the video output, which means, to output a rectangle subarea of the original video. This property has two fields, which define the top-left vertex and bottom-right vertex of the rectangle.<br/><br/>This property is optional, and the default behavior is to output the full video.|
3333
| avatarConfig.videoCrop.topLeft |The top-left vertex of the rectangle for video crop. This property has two fields x and y, to define the horizontal and vertical position of the vertex.<br/><br/>This property is required when properties.videoCrop is set.|
@@ -50,7 +50,7 @@ The following table describes the batch synthesis job properties.
5050
| lastActionDateTime | The most recent date and time when the status property value changed.<br/><br/>This property is read-only.|
5151
| properties | A defined set of optional batch synthesis configuration settings. |
5252
| properties.destinationContainerUrl | The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
53-
| properties.timeToLiveInHours |A duration in hours after the synthesis job is created, when the synthesis results will be automatically deleted. The maximum time to live is 744 hours. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the lastActionDateTime and timeToLive properties.<br/><br/>Otherwise, you can call the [delete synthesis method](../batch-synthesis.md#delete-batch-synthesis) to remove the job sooner. |
53+
| properties.timeToLiveInHours |A duration in hours after the synthesis job is created, when the synthesis results will be automatically deleted. The maximum time to live is 744 hours. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the lastActionDateTime and timeToLive properties.<br/><br/>Otherwise, you can call the [delete synthesis method](../batch-synthesis.md#delete-batch-synthesis) to remove the job sooner. |
5454
| status | The batch synthesis processing status.<br/><br/>The status should progress from "NotStarted" to "Running", and finally to either "Succeeded" or "Failed".<br/><br/>This property is read-only.|
5555

5656

@@ -83,7 +83,7 @@ To generate a transparent background video, you must set the following propertie
8383
|-------------------------|---------------------------------------------|
8484
| properties.videoFormat | webm |
8585
| properties.videoCodec | vp9 |
86-
| properties.backgroundColor | #00000000 (or transparent) |
86+
| properties.backgroundColor | #00000000 (or `transparent`) |
8787

8888
Clipchamp is one example of a video editing tool that supports the transparent background video generated by the batch synthesis API.
8989

articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Text to speech avatar overview - Speech service
33
titleSuffix: Azure AI services
4-
description: Get an overview of the Text to speech avatar feature of speech service, which allows users to create synthetic videos featuring avatars speaking based on text input.
4+
description: Get an overview of the Text to speech avatar feature of speech service, which allows users to create synthetic videos featuring avatars speaking based on text input.
55
manager: nitinme
66
ms.service: azure-ai-speech
77
ms.topic: overview
@@ -36,25 +36,25 @@ With text to speech avatar's advanced neural network models, the feature empower
3636

3737
## Avatar voice and language
3838

39-
You can choose from a range of prebuilt voices for the avatar. The language support for text to speech avatar is the same as the language support for text to speech. For details, see [Language and voice support for the Speech service](../language-support.md?tabs=tts). Prebuilt text to speech avatars can be accessed through the [Speech Studio portal](https://speech.microsoft.com/portal/talkingavatar) or via API.
39+
You can choose from a range of prebuilt voices for the avatar. The language support for text to speech avatar is the same as the language support for text to speech. For details, see [Language and voice support for the Speech service](../language-support.md?tabs=tts). Prebuilt text to speech avatars can be accessed through the [Speech Studio portal](https://speech.microsoft.com/portal/talkingavatar) or via API.
4040

41-
The voice in the synthetic video could be a prebuilt neural voice available on Azure AI Speech or the [custom neural voice](../custom-neural-voice.md) of voice talent selected by you.
41+
The voice in the synthetic video could be a prebuilt neural voice available on Azure AI Speech or the [custom neural voice](../custom-neural-voice.md) of voice talent selected by you.
4242

4343
## Avatar video output
4444

45-
Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264 or h265 if the format is mp4 and can set codec as vp9 if the format is `webm`; only `webm` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
45+
Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264, hevc or av1 if the format is `mp4` and can set codec as vp9 or av1 if the format is `webm`; only `vp9` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
4646

47-
| | Batch synthesis | Real-time synthesis |
48-
|------------------|------------------|----------------------|
49-
| **Resolution** | 1920 x 1080 | 1920 x 1080 |
50-
| **FPS** | 25 | 25 |
51-
| **Codec** | h264/h265/vp9 | h264 |
47+
| | Batch synthesis | Real-time synthesis |
48+
|------------------|-------------------|----------------------|
49+
| **Resolution** | 1920 x 1080 | 1920 x 1080 |
50+
| **FPS** | 25 | 25 |
51+
| **Codec** | h264/hevc/vp9/av1 | h264 |
5252

5353
## Custom text to speech avatar
5454

5555
You can create custom text to speech avatars that are unique to your product or brand. All it takes to get started is taking 10 minutes of video recordings. If you're also creating a custom neural voice for the actor, the avatar can be highly realistic. For more information, see [What is custom text to speech avatar](./what-is-custom-text-to-speech-avatar.md).
5656

57-
[Custom neural voice](../custom-neural-voice.md) and [custom text to speech avatar](what-is-custom-text-to-speech-avatar.md) are separate features. You can use them independently or together. If you plan to also use [custom neural voice](../custom-neural-voice.md) with a text to speech avatar, you need to deploy or [copy](../professional-voice-train-voice.md#copy-your-voice-model-to-another-project) your custom neural voice model to one of the [avatar supported regions](#available-locations).
57+
[Custom neural voice](../custom-neural-voice.md) and [custom text to speech avatar](what-is-custom-text-to-speech-avatar.md) are separate features. You can use them independently or together. If you plan to also use [custom neural voice](../custom-neural-voice.md) with a text to speech avatar, you need to deploy or [copy](../professional-voice-train-voice.md#copy-your-voice-model-to-another-project) your custom neural voice model to one of the [avatar supported regions](#available-locations).
5858

5959
## Sample code
6060

@@ -73,9 +73,9 @@ Sample code for text to speech avatar is available on [GitHub](https://github.co
7373

7474
## Available locations
7575

76-
The text to speech avatar feature is only available in the following service regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, and West US 2.
76+
The text to speech avatar feature is only available in the following service regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, and West US 2.
7777

78-
### Responsible AI
78+
### Responsible AI
7979

8080
We care about the people who use AI and the people who will be affected by it as much as we care about technology. For more information, see the Responsible AI [transparency notes](/legal/cognitive-services/speech-service/text-to-speech/transparency-note?context=/azure/ai-services/speech-service/context/context) and [disclosure for voice and avatar talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/ai-services/speech-service/context/context).
8181

0 commit comments

Comments
 (0)