[avatar] add new supported av1 codec

Yulin Li · Yulin Li · commit 2fba62a2c531 · 2024-10-31T14:45:55.000+08:00
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md b/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md
@@ -1,7 +1,7 @@
 ---
 title: Batch synthesis properties - Speech service
 titleSuffix: Azure AI services
-description: Learn about the batch synthesis properties that are available for text to speech avatar. 
+description: Learn about the batch synthesis properties that are available for text to speech avatar.
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: how-to
@@ -15,7 +15,7 @@ author: eric-urban
 
 Batch synthesis properties can be grouped as: avatar related properties, batch job related properties, and text to speech related properties,  which are described in the following tables.
 
-Some properties in JSON format are required when you create a new batch synthesis job. Other properties are optional. The batch synthesis response includes other properties to provide information about the synthesis status and results. For example, the `outputs.result` property contains the location from where you can download a video file containing the avatar video. From `outputs.summary`, you can access the summary and debug details. 
+Some properties in JSON format are required when you create a new batch synthesis job. Other properties are optional. The batch synthesis response includes other properties to provide information about the synthesis status and results. For example, the `outputs.result` property contains the location from where you can download a video file containing the avatar video. From `outputs.summary`, you can access the summary and debug details.
 
 ## Avatar properties
 
@@ -27,7 +27,7 @@ The following table describes the avatar properties.
 | avatarConfig.talkingAvatarStyle          | The style name of the talking avatar.<br/><br/>The supported avatar styles can be found [here](avatar-gestures-with-ssml.md#supported-prebuilt-avatar-characters-styles-and-gestures).<br/><br/>This property is required for prebuilt avatar, and optional for customized avatar.|
 | avatarConfig.customized                  | A bool value indicating whether the avatar to be used is customized avatar or not. True for customized avatar, and false for prebuilt avatar.<br/><br/>This property is optional, and the default value is `false`.|
 | avatarConfig.videoFormat                 | The format for output video file, could be mp4 or webm.<br/><br/>The `webm` format is required for transparent background.<br/><br/>This property is optional, and the default value is mp4.|
-| avatarConfig.videoCodec                  | The codec for output video, could be h264, hevc or vp9.<br/><br/>Vp9 is required for transparent background. The synthesis speed will be slower with vp9 codec, as vp9 encoding is slower.<br/><br/>This property is optional, and the default value is hevc.|
+| avatarConfig.videoCodec                  | The codec for output video, could be h264, hevc, vp9 or av1.<br/><br/>Vp9 is required for transparent background. The synthesis speed will be slower with vp9 codec, as vp9 encoding is slower.<br/><br/>This property is optional, and the default value is hevc.|
 | avatarConfig.bitrateKbps                 | The bitrate for output video, which is integer value, with unit kbps.<br/><br/>This property is optional, and the default value is 2000.|
 | avatarConfig.videoCrop                   | This property allows you to crop the video output, which means, to output a rectangle subarea of the original video. This property has two fields, which define the top-left vertex and bottom-right vertex of the rectangle.<br/><br/>This property is optional, and the default behavior is to output the full video.|
 | avatarConfig.videoCrop.topLeft           |The top-left vertex of the rectangle for video crop. This property has two fields x and y, to define the horizontal and vertical position of the vertex.<br/><br/>This property is required when properties.videoCrop is set.|
@@ -50,7 +50,7 @@ The following table describes the batch synthesis job properties.
 | lastActionDateTime       | The most recent date and time when the status property value changed.<br/><br/>This property is read-only.|
 | properties               | A defined set of optional batch synthesis configuration settings.  |
 | properties.destinationContainerUrl | The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
-| properties.timeToLiveInHours    |A duration in hours after the synthesis job is created, when the synthesis results will be automatically deleted.  The maximum time to live is 744 hours. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the lastActionDateTime and timeToLive properties.<br/><br/>Otherwise, you can call the [delete synthesis method](../batch-synthesis.md#delete-batch-synthesis) to remove the job sooner. | 
+| properties.timeToLiveInHours    |A duration in hours after the synthesis job is created, when the synthesis results will be automatically deleted.  The maximum time to live is 744 hours. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the lastActionDateTime and timeToLive properties.<br/><br/>Otherwise, you can call the [delete synthesis method](../batch-synthesis.md#delete-batch-synthesis) to remove the job sooner. |
 | status                   | The batch synthesis processing status.<br/><br/>The status should progress from "NotStarted" to "Running", and finally to either "Succeeded" or "Failed".<br/><br/>This property is read-only.|
 
 
@@ -83,7 +83,7 @@ To generate a transparent background video, you must set the following propertie
 |-------------------------|---------------------------------------------|
 | properties.videoFormat  | webm                                        |
 | properties.videoCodec   | vp9                                         |
-| properties.backgroundColor | #00000000 (or transparent)               |
+| properties.backgroundColor | #00000000 (or `transparent`)               |
 
 Clipchamp is one example of a video editing tool that supports the transparent background video generated by the batch synthesis API.
 
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md b/articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md
@@ -1,7 +1,7 @@
 ---
 title: Text to speech avatar overview - Speech service
 titleSuffix: Azure AI services
-description: Get an overview of the Text to speech avatar feature of speech service, which allows users to create synthetic videos featuring avatars speaking based on text input. 
+description: Get an overview of the Text to speech avatar feature of speech service, which allows users to create synthetic videos featuring avatars speaking based on text input.
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: overview
@@ -36,25 +36,25 @@ With text to speech avatar's advanced neural network models, the feature empower
 
 ## Avatar voice and language
 
-You can choose from a range of prebuilt voices for the avatar. The language support for text to speech avatar is the same as the language support for text to speech. For details, see [Language and voice support for the Speech service](../language-support.md?tabs=tts). Prebuilt text to speech avatars can be accessed through the [Speech Studio portal](https://speech.microsoft.com/portal/talkingavatar) or via API. 
+You can choose from a range of prebuilt voices for the avatar. The language support for text to speech avatar is the same as the language support for text to speech. For details, see [Language and voice support for the Speech service](../language-support.md?tabs=tts). Prebuilt text to speech avatars can be accessed through the [Speech Studio portal](https://speech.microsoft.com/portal/talkingavatar) or via API.
 
-The voice in the synthetic video could be a prebuilt neural voice available on Azure AI Speech or the [custom neural voice](../custom-neural-voice.md) of voice talent selected by you. 
+The voice in the synthetic video could be a prebuilt neural voice available on Azure AI Speech or the [custom neural voice](../custom-neural-voice.md) of voice talent selected by you.
 
 ## Avatar video output
 
-Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264 or h265 if the format is mp4 and can set codec as vp9 if the format is `webm`; only `webm` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
+Both batch synthesis and real-time synthesis resolution are 1920 x 1080, and the frames per second (FPS) are 25. Batch synthesis codec can be h264, hevc or av1 if the format is `mp4` and can set codec as vp9 or av1 if the format is `webm`; only `vp9` can contain an alpha channel. Real-time synthesis codec is h264. Video bitrate can be configured for both batch synthesis and real-time synthesis in the request; the default value is 2000000; more detailed configurations can be found in the sample code.
 
-|                  | Batch synthesis  | Real-time synthesis |
-|------------------|------------------|----------------------|
-| **Resolution**   | 1920 x 1080      | 1920 x 1080          |
-| **FPS**          | 25               | 25                   |
-| **Codec**        | h264/h265/vp9    | h264                 |
+|                  | Batch synthesis   | Real-time synthesis |
+|------------------|-------------------|----------------------|
+| **Resolution**   | 1920 x 1080       | 1920 x 1080          |
+| **FPS**          | 25                | 25                   |
+| **Codec**        | h264/hevc/vp9/av1 | h264                 |
 
 ## Custom text to speech avatar
 
 You can create custom text to speech avatars that are unique to your product or brand. All it takes to get started is taking 10 minutes of video recordings. If you're also creating a custom neural voice for the actor, the avatar can be highly realistic. For more information, see [What is custom text to speech avatar](./what-is-custom-text-to-speech-avatar.md).
 
-[Custom neural voice](../custom-neural-voice.md) and [custom text to speech avatar](what-is-custom-text-to-speech-avatar.md) are separate features. You can use them independently or together. If you plan to also use [custom neural voice](../custom-neural-voice.md) with a text to speech avatar, you need to deploy or [copy](../professional-voice-train-voice.md#copy-your-voice-model-to-another-project) your custom neural voice model to one of the [avatar supported regions](#available-locations). 
+[Custom neural voice](../custom-neural-voice.md) and [custom text to speech avatar](what-is-custom-text-to-speech-avatar.md) are separate features. You can use them independently or together. If you plan to also use [custom neural voice](../custom-neural-voice.md) with a text to speech avatar, you need to deploy or [copy](../professional-voice-train-voice.md#copy-your-voice-model-to-another-project) your custom neural voice model to one of the [avatar supported regions](#available-locations).
 
 ## Sample code
 
@@ -73,9 +73,9 @@ Sample code for text to speech avatar is available on [GitHub](https://github.co
 
 ## Available locations
 
-The text to speech avatar feature is only available in the following service regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, and West US 2. 
+The text to speech avatar feature is only available in the following service regions: Southeast Asia, North Europe, West Europe, Sweden Central, South Central US, and West US 2.
 
-### Responsible AI 
+### Responsible AI
 
 We care about the people who use AI and the people who will be affected by it as much as we care about technology. For more information, see the Responsible AI [transparency notes](/legal/cognitive-services/speech-service/text-to-speech/transparency-note?context=/azure/ai-services/speech-service/context/context) and [disclosure for voice and avatar talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/ai-services/speech-service/context/context).