Skip to content

Commit 0be5fc2

Browse files
Update batch-synthesis-avatar.md
1 parent 85ac10a commit 0be5fc2

File tree

1 file changed

+117
-106
lines changed

1 file changed

+117
-106
lines changed

articles/ai-services/speech-service/text-to-speech-avatar/batch-synthesis-avatar.md

Lines changed: 117 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ To perform batch synthesis, you can use the following REST API operations.
2727

2828
| Operation | Method | REST API call |
2929
|----------------------|---------|---------------------------------------------------|
30-
| [Create batch synthesis](#create-a-batch-synthesis-request) | POST | texttospeech/3.1-preview1/batchsynthesis/talkingavatar |
31-
| [Get batch synthesis](#get-batch-synthesis) | GET | texttospeech/3.1-preview1/batchsynthesis/talkingavatar/{SynthesisId} |
32-
| [List batch synthesis](#list-batch-synthesis) | GET | texttospeech/3.1-preview1/batchsynthesis/talkingavatar |
33-
| [Delete batch synthesis](#delete-batch-synthesis) | DELETE | texttospeech/3.1-preview1/batchsynthesis/talkingavatar/{SynthesisId} |
30+
| [Create batch synthesis](#create-a-batch-synthesis-request) | PUT | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-01-preview |
31+
| [Get batch synthesis](#get-batch-synthesis) | GET | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-01-preview |
32+
| [List batch synthesis](#list-batch-synthesis) | GET | avatar/batchsyntheses/?api-version=2024-04-01-preview |
33+
| [Delete batch synthesis](#delete-batch-synthesis) | DELETE | avatar/batchsyntheses/{SynthesisId}?api-version=2024-04-01-preview |
3434

3535
You can refer to the code samples on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch-avatar).
3636

@@ -40,9 +40,9 @@ Some properties in JSON format are required when you create a new batch synthesi
4040

4141
To submit a batch synthesis request, construct the HTTP POST request body following these instructions:
4242

43-
- Set the required `textType` property.
44-
- If the `textType` property is set to `PlainText`, you must also set the `voice` property in the `synthesisConfig`. In the example below, the `textType` is set to `SSML`, so the `speechSynthesis` isn't set.
45-
- Set the required `displayName` property. Choose a name for reference, and it doesn't have to be unique.
43+
- Set the required `inputKind` property.
44+
- If the `inputKind` property is set to `PlainText`, you must also set the `voice` property in the `synthesisConfig`. In the example below, the `inputKind` is set to `SSML`, so the `speechSynthesis` isn't set.
45+
- Set the required `SynthesisId` property. Choose a unique `SynthesisId` for the same speech resource. The `SynthesisId` can be a string of 3 to 64 characters, including letters, numbers, '-', or '_', with the condition that it must start and end with a letter or number.
4646
- Set the required `talkingAvatarCharacter` and `talkingAvatarStyle` properties. You can find supported avatar characters and styles [here](./avatar-gestures-with-ssml.md#supported-pre-built-avatar-characters-styles-and-gestures).
4747
- Optionally, you can set the `videoFormat`, `backgroundColor`, and other properties. For more information, see [batch synthesis properties](batch-synthesis-avatar-properties.md).
4848

@@ -53,47 +53,46 @@ To submit a batch synthesis request, construct the HTTP POST request body follow
5353
>
5454
> The maximum length for the output video is currently 20 minutes, with potential increases in the future.
5555
56-
To make an HTTP POST request, use the URI format shown in the following example. Replace `YourSpeechKey` with your Speech resource key, `YourSpeechRegion` with your Speech resource region, and set the request body properties as described above.
56+
To make an HTTP PUT request, use the URI format shown in the following example. Replace `YourSpeechKey` with your Speech resource key, `YourSpeechRegion` with your Speech resource region, and set the request body properties as described above.
5757

5858
```azurecli-interactive
59-
curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSpeechKey" -H "Content-Type: application/json" -d '{
60-
"displayName": "avatar batch synthesis sample",
61-
"textType": "SSML",
59+
curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourSpeechKey" -H "Content-Type: application/json" -d '{
60+
"inputKind": "SSML",
6261
"inputs": [
6362
{
64-
"text": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''>
65-
<voice name='\''en-US-JennyNeural'\''>
66-
The rainbow has seven colors.
67-
</voice>
68-
</speak>"
63+
"content": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice name='\''en-US-JennyNeural'\''>The rainbow has seven colors.</voice></speak>"
6964
}
7065
],
71-
"properties": {
66+
"avatarConfig": {
7267
"talkingAvatarCharacter": "lisa",
7368
"talkingAvatarStyle": "graceful-sitting"
7469
}
75-
}' "https://YourSpeechRegion.customvoice.api.speech.microsoft.com/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar"
70+
}' "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/my-job-01?api-version=2024-04-01-preview"
7671
```
7772

7873
You should receive a response body in the following format:
7974

8075
```json
8176
{
82-
"textType": "SSML",
77+
"id": "my-job-01",
78+
"internalId": "5a25b929-1358-4e81-a036-33000e788c46",
79+
"status": "NotStarted",
80+
"createdDateTime": "2024-03-06T07:34:08.9487009Z",
81+
"lastActionDateTime": "2024-03-06T07:34:08.9487012Z",
82+
"inputKind": "SSML",
8383
"customVoices": {},
8484
"properties": {
85-
"timeToLive": "P31D",
86-
"outputFormat": "riff-24khz-16bit-mono-pcm",
85+
"timeToLiveInHours": 744,
86+
},
87+
"avatarConfig": {
8788
"talkingAvatarCharacter": "lisa",
8889
"talkingAvatarStyle": "graceful-sitting",
89-
"kBitrate": 2000,
90+
"videoFormat": "Mp4",
91+
"videoCodec": "hevc",
92+
"subtitleType": "soft_embedded",
93+
"bitrateKbps": 2000,
9094
"customized": false
91-
},
92-
"lastActionDateTime": "2023-10-19T12:23:03.348Z",
93-
"status": "NotStarted",
94-
"id": "c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6",
95-
"createdDateTime": "2023-10-19T12:23:03.348Z",
96-
"displayName": "avatar batch synthesis sample"
95+
}
9796
}
9897
```
9998

@@ -107,40 +106,45 @@ To retrieve the status of a batch synthesis job, make an HTTP GET request using
107106
Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
108107

109108
```azurecli-interactive
110-
curl -v -X GET "https://YourSpeechRegion.customvoice.api.speech.microsoft.com/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar/YourSynthesisId" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
109+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-04-01-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
111110
```
112111

113112
You should receive a response body in the following format:
114113

115114
```json
116115
{
117-
"textType": "SSML",
116+
"id": "my-job-01",
117+
"internalId": "5a25b929-1358-4e81-a036-33000e788c46",
118+
"status": "Succeeded",
119+
"createdDateTime": "2024-03-06T07:34:08.9487009Z",
120+
"lastActionDateTime": "2024-03-06T07:34:12.5698769",
121+
"inputKind": "SSML",
118122
"customVoices": {},
119123
"properties": {
120-
"audioSize": 336780,
121-
"durationInTicks": 25200000,
122-
"succeededAudioCount": 1,
123-
"duration": "PT2.52S",
124+
"timeToLiveInHours": 744,
125+
"sizeInBytes": 344460,
126+
"durationInMilliseconds": 2520,
127+
"succeededCount": 1,
128+
"failedCount": 0,
124129
"billingDetails": {
130+
"neural": 29,
125131
"customNeural": 0,
126-
"neural": 29
127-
},
128-
"timeToLive": "P31D",
129-
"outputFormat": "riff-24khz-16bit-mono-pcm",
132+
"talkingAvatarDurationInSeconds": 2
133+
}
134+
},
135+
"avatarConfig": {
130136
"talkingAvatarCharacter": "lisa",
131137
"talkingAvatarStyle": "graceful-sitting",
132-
"kBitrate": 2000,
138+
"videoFormat": "Mp4",
139+
"videoCodec": "hevc",
140+
"subtitleType": "soft_embedded",
141+
"bitrateKbps": 2000,
133142
"customized": false
134143
},
135144
"outputs": {
136-
"result": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6/0001.mp4?SAS_Token",
137-
"summary": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6/summary.json?SAS_Token"
138-
},
139-
"lastActionDateTime": "2023-10-19T12:23:06.320Z",
140-
"status": "Succeeded",
141-
"id": "c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6",
142-
"createdDateTime": "2023-10-19T12:23:03.350Z",
143-
"displayName": "avatar batch synthesis sample"
145+
"result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/0001.mp4?SAS_Token",
146+
"summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/summary.json?SAS_Token"
147+
}
144148
}
145149
```
146150

@@ -151,86 +155,93 @@ From the `outputs.result` field, you can download a video file containing the av
151155

152156
To list all batch synthesis jobs for your Speech resource, make an HTTP GET request using the URI as shown in the following example.
153157

154-
Replace `YourSpeechKey` with your Speech resource key and `YourSpeechRegion` with your Speech resource region. Optionally, you can set the `skip` and `top` (page size) query parameters in the URL. The default value for `skip` is 0, and the default value for `top` is 100.
158+
Replace `YourSpeechKey` with your Speech resource key and `YourSpeechRegion` with your Speech resource region. Optionally, you can set the `skip` and `top` (page size) query parameters in the URL. The default value for `skip` is 0, and the default value for `maxpagesize` is 100.
155159

156160
```azurecli-interactive
157-
curl -v -X GET "https://YourSpeechRegion.customvoice.api.speech.microsoft.com/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar?skip=0&top=2" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
161+
curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-04-01-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
158162
```
159163

160164
You receive a response body in the following format:
161165

162166
```json
163167
{
164-
"values": [
168+
"value": [
165169
{
166-
"textType": "PlainText",
167-
"synthesisConfig": {
168-
"voice": "en-US-JennyNeural"
169-
},
170+
"id": "my-job-02",
171+
"internalId": "14c25fcf-3cb6-4f46-8810-ecad06d956df",
172+
"status": "Succeeded",
173+
"createdDateTime": "2024-03-06T07:52:23.9054709Z",
174+
"lastActionDateTime": "2024-03-06T07:52:29.3416944",
175+
"inputKind": "SSML",
170176
"customVoices": {},
171177
"properties": {
172-
"audioSize": 339371,
173-
"durationInTicks": 25200000,
174-
"succeededAudioCount": 1,
175-
"duration": "PT2.52S",
178+
"timeToLiveInHours": 744,
179+
"sizeInBytes": 502676,
180+
"durationInMilliseconds": 2950,
181+
"succeededCount": 1,
182+
"failedCount": 0,
176183
"billingDetails": {
184+
"neural": 32,
177185
"customNeural": 0,
178-
"neural": 29
179-
},
180-
"timeToLive": "P31D",
181-
"outputFormat": "riff-24khz-16bit-mono-pcm",
186+
"talkingAvatarDurationInSeconds": 2
187+
}
188+
},
189+
"avatarConfig": {
182190
"talkingAvatarCharacter": "lisa",
183-
"talkingAvatarStyle": "graceful-sitting",
184-
"kBitrate": 2000,
191+
"talkingAvatarStyle": "casual-sitting",
192+
"videoFormat": "Mp4",
193+
"videoCodec": "h264",
194+
"subtitleType": "soft_embedded",
195+
"bitrateKbps": 2000,
185196
"customized": false
186197
},
187198
"outputs": {
188-
"result": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/8e3fea5f-4021-4734-8c24-77d3be594633/0001.mp4?SAS_Token",
189-
"summary": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/8e3fea5f-4021-4734-8c24-77d3be594633/summary.json?SAS_Token"
190-
},
191-
"lastActionDateTime": "2023-10-19T12:57:45.557Z",
192-
"status": "Succeeded",
193-
"id": "8e3fea5f-4021-4734-8c24-77d3be594633",
194-
"createdDateTime": "2023-10-19T12:57:42.343Z",
195-
"displayName": "avatar batch synthesis sample"
199+
"result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/14c25fcf-3cb6-4f46-8810-ecad06d956df/0001.mp4?SAS_Token",
200+
"summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/14c25fcf-3cb6-4f46-8810-ecad06d956df/summary.json?SAS_Token"
201+
}
196202
},
197203
{
198-
"textType": "SSML",
204+
"id": "my-job-01",
205+
"internalId": "5a25b929-1358-4e81-a036-33000e788c46",
206+
"status": "Succeeded",
207+
"createdDateTime": "2024-03-06T07:34:08.9487009Z",
208+
"lastActionDateTime": "2024-03-06T07:34:12.5698769",
209+
"inputKind": "SSML",
199210
"customVoices": {},
200211
"properties": {
201-
"audioSize": 336780,
202-
"durationInTicks": 25200000,
203-
"succeededAudioCount": 1,
204-
"duration": "PT2.52S",
212+
"timeToLiveInHours": 744,
213+
"sizeInBytes": 344460,
214+
"durationInMilliseconds": 2520,
215+
"succeededCount": 1,
216+
"failedCount": 0,
205217
"billingDetails": {
218+
"neural": 29,
206219
"customNeural": 0,
207-
"neural": 29
208-
},
209-
"timeToLive": "P31D",
210-
"outputFormat": "riff-24khz-16bit-mono-pcm",
220+
"talkingAvatarDurationInSeconds": 2
221+
}
222+
},
223+
"avatarConfig": {
211224
"talkingAvatarCharacter": "lisa",
212225
"talkingAvatarStyle": "graceful-sitting",
213-
"kBitrate": 2000,
226+
"videoFormat": "Mp4",
227+
"videoCodec": "hevc",
228+
"subtitleType": "soft_embedded",
229+
"bitrateKbps": 2000,
214230
"customized": false
215231
},
216232
"outputs": {
217-
"result": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6/0001.mp4?SAS_Token",
218-
"summary": "https://cvoiceprodwus2.blob.core.windows.net/batch-synthesis-output/c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6/summary.json?SAS_Token"
219-
},
220-
"lastActionDateTime": "2023-10-19T12:23:06.320Z",
221-
"status": "Succeeded",
222-
"id": "c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6",
223-
"createdDateTime": "2023-10-19T12:23:03.350Z",
224-
"displayName": "avatar batch synthesis sample"
233+
"result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/0001.mp4?SAS_Token",
234+
"summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/summary.json?SAS_Token"
235+
}
225236
}
226237
],
227-
"@nextLink": "https://{region}.customvoice.api.speech.microsoft.com/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar?skip=2&top=2"
238+
"nextLink": "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/?api-version=2024-04-01-preview&skip=2&maxpagesize=2"
228239
}
229240
```
230241

231242
From `outputs.result`, you can download a video file containing the avatar video. From `outputs.summary`, you can access the summary and debug details. For more information, see [batch synthesis results](#get-batch-synthesis-results-file).
232243

233-
The `values` property in the JSON response lists your synthesis requests. The list is paginated, with a maximum page size of 100. The `@nextLink` property is provided as needed to get the next page of the paginated list.
244+
The `value` property in the JSON response lists your synthesis requests. The list is paginated, with a maximum page size of 100. The `nextLink` property is provided as needed to get the next page of the paginated list.
234245

235246
## Get batch synthesis results file
236247

@@ -252,34 +263,34 @@ The summary file contains the synthesis results for each text input. Here's an e
252263

253264
```json
254265
{
255-
"jobID": "c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6",
256-
"status": "Succeeded",
257-
"results": [
266+
"jobID": "5a25b929-1358-4e81-a036-33000e788c46",
267+
"status": "Succeeded",
268+
"results": [
258269
{
259-
"texts": [
260-
"<speak version='1.0' xml:lang='en-US'>\n\t\t\t\t<voice name='en-US-JennyNeural'>\n\t\t\t\t\tThe rainbow has seven colors.\n\t\t\t\t</voice>\n\t\t\t</speak>"
270+
"texts": [
271+
"<speak version='1.0' xml:lang='en-US'><voice name='en-US-JennyNeural'>The rainbow has seven colors.</voice></speak>"
261272
],
262-
"status": "Succeeded",
263-
"billingDetails": {
264-
"Neural": "29",
265-
"TalkingAvatarDuration": "2"
273+
"status": "Succeeded",
274+
"billingDetails": {
275+
"Neural": "29",
276+
"TalkingAvatarDuration": "2"
266277
},
267-
"videoFileName": "c48b4cf5-957f-4a0f-96af-a4e3e71bd6b6/0001.mp4",
268-
"TalkingAvatarCharacter": "lisa",
269-
"TalkingAvatarStyle": "graceful-sitting"
278+
"videoFileName": "244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/0001.mp4",
279+
"TalkingAvatarCharacter": "lisa",
280+
"TalkingAvatarStyle": "graceful-sitting"
270281
}
271282
]
272283
}
273284
```
274285

275286
## Delete batch synthesis
276287

277-
After you have retrieved the audio output results and no longer need the batch synthesis job history, you can delete it. The Speech service retains each synthesis history for up to 31 days or the duration specified by the request's `timeToLive` property, whichever comes sooner. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the `lastActionDateTime` and `timeToLive` properties.
288+
After you have retrieved the audio output results and no longer need the batch synthesis job history, you can delete it. The Speech service retains each synthesis history for up to 31 days or the duration specified by the request's `timeToLiveInHours` property, whichever comes sooner. The date and time of automatic deletion, for synthesis jobs with a status of "Succeeded" or "Failed" is calculated as the sum of the `lastActionDateTime` and `timeToLive` properties.
278289

279290
To delete a batch synthesis job, make an HTTP DELETE request using the following URI format. Replace `YourSynthesisId` with your batch synthesis ID, `YourSpeechKey` with your Speech resource key, and `YourSpeechRegion` with your Speech resource region.
280291

281292
```azurecli-interactive
282-
curl -v -X DELETE "https://YourSpeechRegion.customvoice.api.speech.microsoft.com/api/texttospeech/3.1-preview1/batchsynthesis/talkingavatar/YourSynthesisId" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
293+
curl -v -X DELETE "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-04-01-preview" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"
283294
```
284295

285296
The response headers include `HTTP/1.1 204 No Content` if the delete request was successful.

0 commit comments

Comments
 (0)