Merge pull request #6054 from zhifzhan/zhifzhan/update-batch-synthesis-doc

prmerger-automator[bot] · web-flow · commit 3bf83ab33bc2 · 2025-07-22T17:25:00.000Z
[Batch Synthesis API] Update batch synthesis properties
diff --git a/articles/ai-services/speech-service/batch-synthesis-properties.md b/articles/ai-services/speech-service/batch-synthesis-properties.md
@@ -31,7 +31,7 @@ Batch synthesis properties are described in the following table.
 |`customVoices`|The map of a custom voice name and its deployment ID.<br/><br/>For example: `"customVoices": {"your-custom-voice-name": "502ac834-6537-4bc3-9fd6-140114daa66d"}`<br/><br/>You can use the voice name in your `synthesisConfig.voice` (when the `inputKind` is set to `"PlainText"`) or within the SSML text of `inputs` (when the `inputKind` is set to `"SSML"`).<br/><br/>This property is required to use a custom voice. If you try to use a custom voice that isn't defined here, the service returns an error.|
 |`description`|The description of the batch synthesis.<br/><br/>This property is optional.|
 |`id`|The batch synthesis job ID you passed in path.<br/><br/>This property is required in path.|
-|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"text": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"text": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"text": "synthesize this to a file"},{"text": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"text": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
+|`inputs`|The plain text or SSML to be synthesized.<br/><br/>When the `inputKind` is set to `"PlainText"`, provide plain text as shown here: `"inputs": [{"content": "The rainbow has seven colors."}]`. When the `inputKind` is set to `"SSML"`, provide text in the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) as shown here: `"inputs": [{"content": "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"}]`.<br/><br/>Include up to 1,000 text objects if you want multiple audio output files. Here's example input text that should be synthesized to two audio output files: `"inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}]`. However, if the `properties.concatenateResult` property is set to `true`, then each synthesized result is written to the same audio output file.<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>There are no paragraph limits, but the maximum JSON payload size (including all text inputs and other properties) is 2 megabytes.<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
 |`lastActionDateTime`|The most recent date and time when the `status` property value changed.<br/><br/>This property is read-only.|
 |`outputs.result`|The location of the batch synthesis result files with audio output and logs.<br/><br/>This property is read-only.|
 |`properties`|A defined set of optional batch synthesis configuration settings.|
@@ -40,7 +40,7 @@ Batch synthesis properties are described in the following table.
 |`properties.concatenateResult`|Determines whether to concatenate the result. This optional `bool` value ("true" or "false") is "false" by default.|
 |`properties.decompressOutputFiles`|Determines whether to unzip the synthesis result files in the destination container. This property can only be set when the `destinationContainerUrl` property is set. This optional `bool` value ("true" or "false") is "false" by default.|
 |`properties.destinationContainerUrl`|The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
-|`properties.destinationPath`|The prefix path where batch synthesis results can be stored with. If you don't specify a prefix path, the default prefix path is `YourSpeechResourceId/YourSynthesisId`.<br/><br/>This optional property can only be set when the `destinationContainerUrl` property is set.|
+|`properties.destinationPath`|The prefix path for storing batch synthesis results. If no prefix path is provided, a system-generated path will be used.<br/><br/>This property is optional and can only be set when the `destinationContainerUrl` property is specified.|
 |`properties.durationInMilliseconds`|The audio output duration in milliseconds.<br/><br/>This property is read-only.|
 |`properties.failedAudioCount`|The count of batch synthesis inputs to audio output failed.<br/><br/>This property is read-only.|
 |`properties.outputFormat`|The audio output format.<br/><br/>For information about the accepted values, see [audio output formats](rest-text-to-speech.md#audio-outputs). The default output format is `riff-24khz-16bit-mono-pcm`.|
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md b/articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar-properties.md
@@ -23,8 +23,8 @@ The following table describes the avatar properties.
 
 | Property  | Description |
 |------------------------------------------|------------------------------------------|
-| avatarConfig.talkingAvatarCharacter      | The character name of the talking avatar.<br/><br/>The supported avatar characters can be found [here](avatar-gestures-with-ssml.md#supported-standard-avatar-characters-styles-and-gestures).<br/><br/>This property is required.|
-| avatarConfig.talkingAvatarStyle          | The style name of the talking avatar.<br/><br/>The supported avatar styles can be found [here](avatar-gestures-with-ssml.md#supported-standard-avatar-characters-styles-and-gestures).<br/><br/>This property is required for standard avatar, and optional for customized avatar.|
+| avatarConfig.talkingAvatarCharacter      | The character name of the talking avatar .<br/><br/>For standard avatar, the supported avatar characters can be found [here](avatar-gestures-with-ssml.md#supported-standard-avatar-characters-styles-and-gestures). <br/>For custom avatar, specify the avatar model name.<br/><br/>This property is required.|
+| avatarConfig.talkingAvatarStyle          | The style name of the talking avatar.<br/><br/>For standard avatar, the supported avatar styles can be found [here](avatar-gestures-with-ssml.md#supported-standard-avatar-characters-styles-and-gestures). <br/>For custom avatar, this property should be omitted.<br/><br/>This property is required for standard avatar.|
 | avatarConfig.customized                  | A bool value indicating whether the avatar to be used is customized avatar or not. True for customized avatar, and false for standard avatar.<br/><br/>This property is optional, and the default value is `false`.|
 | avatarConfig.videoFormat                 | The format for output video file could be mp4 or webm.<br/><br/>The `webm` format is required for transparent background.<br/><br/>This property is optional, and the default value is mp4.|
 | avatarConfig.videoCodec                  | The codec for output video, could be h264, hevc, vp9 or av1.<br/><br/>Vp9 is required for transparent background. The synthesis speed is slower with vp9 codec, as vp9 encoding is slower.<br/><br/>This property is optional, and the default value is hevc.|
@@ -35,6 +35,7 @@ The following table describes the avatar properties.
 | avatarConfig.subtitleType                | Type of subtitle for the avatar video file could be `external_file`, `soft_embedded`, `hard_embedded`, or `none`.<br/><br/>This property is optional, and the default value is `soft_embedded`.|
 | avatarConfig.backgroundImage           | Add a background image using the `avatarConfig.backgroundImage` property. The value of the property should be a URL pointing to the desired image. This property is optional. |
 | avatarConfig.backgroundColor             | Background color of the avatar video, which is a string in #RRGGBBAA format. In this string: RR, GG, BB and AA mean the red, green, blue, and alpha channels, with hexadecimal value range 00~FF. Alpha channel controls the transparency, with value 00 for transparent, value FF for non-transparent, and value between 00 and FF for semi-transparent.<br/><br/>This property is optional, and the default value is #FFFFFFFF (white).|
+| avatarConfig.useBuiltInVoice             | A boolean value indicating whether to use the voice sync for avatar as the synthesis voice. This property can only be used when using custom avatar trained with voice sync for avatar. When set to `true`, other voices specified in `synthesisConfig` or in SSML will be ignored. <br/><br/>This property is optional, and the default value is `false`. |
 | outputs.result                           | The location of the batch synthesis result file, which is a video file containing the synthesized avatar.<br/><br/>This property is read-only.|
 | properties.DurationInMilliseconds        | The video output duration in milliseconds.<br/><br/>This property is read-only.  |
 
@@ -49,7 +50,8 @@ The following table describes the batch synthesis job properties.
 | ID                       | The batch synthesis job ID.<br/><br/>This property is read-only.|
 | lastActionDateTime       | The most recent date and time when the status property value changed.<br/><br/>This property is read-only.|
 | properties               | A defined set of optional batch synthesis configuration settings.  |
-| properties.destinationContainerUrl | The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
+| properties.destinationContainerUrl | The batch synthesis results can be stored in a writable Azure Blob Storage container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This property is required when generating multiple videos in one job. For single video generation, this property is optional. <br/><br/>This property is not included in the response when you get the synthesis job.|
+| properties.destinationPath | The prefix path for storing batch synthesis results. If no prefix path is provided, a system-generated path will be used.<br/><br/>This property is optional and can only be set when the `destinationContainerUrl` property is specified.|
 | properties.timeToLiveInHours    |A duration in hours after the synthesis job is created, when the synthesis results will be automatically deleted. The maximum time to live is 744 hours. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the lastActionDateTime and timeToLive properties.<br/><br/>Otherwise, you can call the [delete synthesis method](../batch-synthesis.md#delete-batch-synthesis) to remove the job sooner. |
 | status                   | The batch synthesis processing status.<br/><br/>The status should progress from "NotStarted" to "Running", and finally to either "Succeeded" or "Failed".<br/><br/>This property is read-only.|
 
@@ -61,7 +63,7 @@ The following table describes the text to speech properties.
 | Property                 | Description |
 |--------------------------|--------------------------|
 | customVoices             | A custom voice is associated with a name and its deployment ID, like this: "customVoices": {"your-custom-voice-name": "502ac834-6537-4bc3-9fd6-140114daa66d"}<br/><br/>You can use the voice name in your `synthesisConfig.voice` when `inputKind` is set to "PlainText", or within SSML text of inputs when `inputKind` is set to "SSML".<br/><br/>This property is required to use a custom voice. If you try to use a custom voice that isn't defined here, the service returns an error.|
-| inputs                   | The plain text or SSML to be synthesized.<br/><br/>When the inputKind is set to "PlainText", provide plain text as shown here: "inputs": [{"content": "The rainbow has seven colors."}]. When the inputKind is set to "SSML", provide text in the Speech Synthesis Markup Language (SSML) as shown here: "inputs": [{"content": "<speak version='\'1.0'\'' xml:lang='\'en-US'\''><voice xml:lang='\'en-US'\'' xml:gender='\'Female'\'' name='\'en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"}].<br/><br/>Include up to 1,000 text objects if you want multiple video output files. Here's example input text that should be synthesized to two video output files: "inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}].<br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: "inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
+| inputs                   | The plain text or SSML to be synthesized.<br/><br/>When the inputKind is set to `"PlainText"`, provide plain text as shown here:  `"inputs": [{"content": "The rainbow has seven colors."}]. ` <br/>When the inputKind is set to `"SSML"`, provide text in the Speech Synthesis Markup Language (SSML) as shown here: `"inputs": [{"content": "<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"}].` <br/><br/> Include up to 1,000 text objects if you want multiple video output files. Here's example input text that should be synthesized to two video output files: `"inputs": [{"content": "synthesize this to a file"},{"content": "synthesize this to another file"}]`. <br/>To generate multiple videos, the output must be stored in an Azure Blob Storage container by specifying `properties.destinationContainerUrl`. <br/><br/>You don't need separate text inputs for new paragraphs. Within any of the (up to 1,000) text inputs, you can specify new paragraphs using the "\r\n" (newline) string. Here's example input text with two paragraphs that should be synthesized to the same audio output file: `"inputs": [{"content": "synthesize this to a file\r\nsynthesize this to another paragraph in the same file"}]`<br/><br/>This property is required when you create a new batch synthesis job. This property isn't included in the response when you get the synthesis job.|
 | properties.billingDetails | The number of words that were processed and billed by `customNeural` (custom voice) versus `neural` (standard voice).<br/><br/>This property is read-only.|
 | synthesisConfig          | The configuration settings to use for batch synthesis of plain text.<br/><br/>This property is only applicable when inputKind is set to "PlainText".|
 | synthesisConfig.pitch    | The pitch of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](../speech-synthesis-markup-voice.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when inputKind is set to "PlainText".|