|
| 1 | +--- |
| 2 | +title: include file |
| 3 | +description: include file |
| 4 | +author: sally-baolian |
| 5 | +ms.author: v-baolianzou |
| 6 | +ms.service: azure-ai-speech |
| 7 | +ms.topic: include |
| 8 | +ms.date: 10/11/2024 |
| 9 | +ms.custom: include |
| 10 | +--- |
| 11 | + |
| 12 | +The video translation REST API facilitates seamless video translation integration into your applications. It supports uploading, managing, and refining video translations, with multiple iterations for continuous improvement. In this article, you learn how to utilize video translation through the REST API. |
| 13 | + |
| 14 | +This diagram provides a high-level overview of the workflow. |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +You can use the following REST API operations for video translation: |
| 19 | + |
| 20 | +| Operation | Method | REST API call | |
| 21 | +| ----------------------------------------------------- | ------ | ------------------------------------------------- | |
| 22 | +| [Create a translation](/rest/api/aiservices/videotranslation/translation-operations/create-translation) | `PUT` | `/translations/{translationId}` | |
| 23 | +| [List translations](/rest/api/aiservices/videotranslation/translation-operations/list-translation) | `GET` | `/translations` | |
| 24 | +| [Get a translation by translation ID](/rest/api/aiservices/videotranslation/translation-operations/get-translation) | `GET` | `/translations/{translationId}` | |
| 25 | +| [Create an iteration](/rest/api/aiservices/videotranslation/iteration-operations/create-iteration) | `PUT` | `/translations/{translationId}/iterations/{iterationId}` | |
| 26 | +| [List iterations](/rest/api/aiservices/videotranslation/iteration-operations/list-iteration) | `GET` | `/translations/{translationId}/iterations` | |
| 27 | +| [Get an iteration by iteration ID](/rest/api/aiservices/videotranslation/iteration-operations/get-iteration) | `GET` | `/translations/{translationId}/iterations/{iterationId}` | |
| 28 | +| [Get operation by operation ID](/rest/api/aiservices/videotranslation/operation-operations/get-operation) | `GET` | `/operations/{operationId}` | |
| 29 | +| [Delete a translation by translation ID](/rest/api/aiservices/videotranslation/translation-operations/delete-translation) | `DELETE`| `/translations/{translationId}` | |
| 30 | + |
| 31 | +For code samples, see [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/video-translation/csharp). |
| 32 | + |
| 33 | +This article outlines the primary steps of the API process, including creating a translation, creating an iteration, checking the status of each operation, getting an iteration by iteration ID, and deleting a translation by translation ID. For complete details, refer to the links provided for each API in the table. |
| 34 | + |
| 35 | +## Create a translation |
| 36 | + |
| 37 | +To submit a video translation request, you need to construct an HTTP PUT request path and body according to the following instructions: |
| 38 | + |
| 39 | +- Specify `Operation-Id`: The `Operation-Id` must be unique for each operation. It ensures that each operation is tracked separately. Replace `[operationId]` with an operation ID. |
| 40 | +- Specify `translationId`: The `translationId` must be unique. Replace `[translationId]` with a translation ID. |
| 41 | +- Set the required input: Include details like `sourceLocale`, `targetLocale`, `voiceKind`, and `videoFileUrl`. Ensure that you have the video URL from Azure Blob Storage. For the languages supported for video translation, refer to the [supported source and target languages](../../../language-support.md?tabs=speech-translation#video-translation). You can set `voiceKind` parameter to either `PlatformVoice` or `PersonalVoice`. For `PlatformVoice`, the system automatically selects the most suitable prebuilt voice by matching the speaker's voice in the video with prebuilt voices. For `PersonalVoice`, the system offers a model that generates high-quality voice replication in a few seconds. |
| 42 | + |
| 43 | + >[!NOTE] |
| 44 | + > To use personal voice, you need to apply for [access](https://aka.ms/customneural). |
| 45 | +
|
| 46 | +- Replace `[YourResourceKey]` with your Speech resource key and replace `[YourSpeechRegion]` with your Speech resource region. |
| 47 | + |
| 48 | +Creating a translation doesn't initiate the translation process. You can start translating the video by [creating an iteration](#create-an-iteration). The following example is for Windows shell. Ensure to escape `&` with `^&` if the URL contains `&`. |
| 49 | + |
| 50 | +```azurecli-interactive |
| 51 | +curl -v -X PUT -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" -H "Operation-Id: [operationId]" -H "Content-Type: application/json" -d "{\"displayName\": \"[YourDisplayName]\",\"description\": \"[OptionalYourDescription]\",\"input\": {\"sourceLocale\": \"[VideoSourceLocale]\",\"targetLocale\": \"[TranslationTargetLocale]\",\"voiceKind\": \"[PlatformVoice/PersonalVoice]\",\"speakerCount\": [OptionalVideoSpeakerCount],\"subtitleMaxCharCountPerSegment\": [OptionalYourPreferredSubtitleMaxCharCountPerSegment],\"exportSubtitleInVideo\": [Optional true/false],\"videoFileUrl\": \"[AzureBlobUrlWithSas]\"}}" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]?api-version=2024-05-20-preview" |
| 52 | +``` |
| 53 | + |
| 54 | +>[!IMPORTANT] |
| 55 | +> Data created through the API won't appear in Speech Studio, and the data between the API and Speech Studio isn't synchronized. |
| 56 | +
|
| 57 | +You should receive a response body in the following format: |
| 58 | + |
| 59 | +```json |
| 60 | +{ |
| 61 | + "input": { |
| 62 | + "sourceLocale": "zh-CN", |
| 63 | + "targetLocale": "en-US", |
| 64 | + "voiceKind": "PlatformVoice", |
| 65 | + "speakerCount": 1, |
| 66 | + "subtitleMaxCharCountPerSegment": 30, |
| 67 | + "exportSubtitleInVideo": true |
| 68 | + }, |
| 69 | + "status": "NotStarted", |
| 70 | + "lastActionDateTime": "2024-09-20T06:25:05.058Z", |
| 71 | + "id": "mytranslation0920", |
| 72 | + "displayName": "demo", |
| 73 | + "description": "for testing", |
| 74 | + "createdDateTime": "2024-09-20T06:25:05.058Z" |
| 75 | +} |
| 76 | +``` |
| 77 | +The status property should progress from `NotStarted` status, to `Running`, and finally to `Succeeded` or `Failed`. You can call the [Get operation by operation ID](#get-operation-by-operation-id) API periodically until the returned status is `Succeeded` or `Failed`. This operation allows you to monitor the progress of your creating translation process. |
| 78 | + |
| 79 | +## Get operation by operation ID |
| 80 | + |
| 81 | +Check the status of a specific operation using its operation ID. The operation ID is unique for each operation, so you can track each operation separately. |
| 82 | + |
| 83 | +Replace `[YourResourceKey]` with your Speech resource key, `[YourSpeechRegion]` with your Speech resource region, and `[operationId]` with the operation ID you want to check. |
| 84 | + |
| 85 | +```azurecli-interactive |
| 86 | +curl -v -X GET -H "Ocp-Apim-Subscription-Key:[YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/operations/[operationId]?api-version=2024-05-20-preview" |
| 87 | +``` |
| 88 | + |
| 89 | +You should receive a response body in the following format: |
| 90 | + |
| 91 | +```json |
| 92 | +{ |
| 93 | + "id": "createtranslation0920-1", |
| 94 | + "status": "Running" |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +## Create an iteration |
| 99 | + |
| 100 | +To start translating your video or update an iteration for an existing translation, you need to construct an HTTP PUT request path and body according to the following instructions: |
| 101 | + |
| 102 | +- Specify `Operation-Id`: The `Operation-Id` must be unique for each operation, such as creating each iteration. Replace `[operationId]` with a unique ID for this operation. |
| 103 | +- Specify `translationId`: If multiple iterations are performed under a single translation, the translation ID remains unchanged. |
| 104 | +- Specify `iterationId`: The `iterationId` must be unique for each operation. Replace `[iterationId]` with an iteration ID. |
| 105 | +- Set the required input: Include details like `speakerCount`, `subtitleMaxCharCountPerSegment`,`exportSubtitleInVideo`, or `webvttFile`. No subtitles are embedded in the output video by default. |
| 106 | +- Replace `[YourResourceKey]` with your Speech resource key and replace `[YourSpeechRegion]` with your Speech resource region. |
| 107 | + |
| 108 | +The following example is for Windows shell. Ensure to escape `&` with `^&` if the URL contains `&`. |
| 109 | + |
| 110 | +```azurecli-interactive |
| 111 | +curl -v -X PUT -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" -H "Operation-Id: [operationId]" -H "Content-Type: application/json" -d "{\"input\": {\"speakerCount\": [OptionalVideoSpeakerCount],\"subtitleMaxCharCountPerSegment\": [OptionalYourPreferredSubtitleMaxCharCountPerSegment],\"exportSubtitleInVideo\": [Optional true/false],\"webvttFile\": {\"Kind\": \"[SourceLocaleSubtitle/TargetLocaleSubtitle/MetadataJson]\", \"url\": \"[AzureBlobUrlWithSas]\"}}}" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]/iterations/[iterationId]?api-version=2024-05-20-preview" |
| 112 | +``` |
| 113 | + |
| 114 | +> [!NOTE] |
| 115 | +> When creating an iteration, if you've already specified the optional parameters `speakerCount`, `subtitleMaxCharCountPerSegment`, and `exportSubtitleInVideo` during the creation of translation, you don’t need to specify them again. The values will inherit from translation settings. Once these parameters are defined when creating an iteration, the new values will override the original settings. |
| 116 | +> |
| 117 | +> The `webvttFile` parameter isn't required when creating the first iteration. However, starting from the second iteration, you must specify the `webvttFile` parameter in the iteration process. You need to download the webvtt file, make necessary edits, and then upload it to your Azure Blob storage. You need to specify the Blob URL in the curl code. |
| 118 | +> |
| 119 | +> Data created through the API won't appear in Speech Studio, and the data between the API and Speech Studio isn't synchronized. |
| 120 | +
|
| 121 | +The subtitle file can be in WebVTT or JSON format. If you're unsure about how to prepare a WebVTT file, refer to the following sample formats. |
| 122 | + |
| 123 | +#### [WebVTT](#tab/webvtt) |
| 124 | + |
| 125 | +```WebVTT |
| 126 | +
|
| 127 | +00:00:01.010 --> 00:00:06.030 |
| 128 | +Hello this is a sample subtitle. |
| 129 | +
|
| 130 | +00:00:07.030 --> 00:00:09.030 |
| 131 | +Hello this is a sample subtitle. |
| 132 | +
|
| 133 | +``` |
| 134 | + |
| 135 | +#### [WebVTT with JSON Metadata](#tab/webvttwithjsonmetadata) |
| 136 | + |
| 137 | +```WebVTT with JSON Metadata |
| 138 | +
|
| 139 | +00:00:01.010 --> 00:00:02.030 |
| 140 | +{ |
| 141 | + "globalMetadata": { |
| 142 | + "locale": "zh-CN", |
| 143 | + "changeTtsVoiceNameIfConflictWithGender": true, |
| 144 | + "speakers": { |
| 145 | + "PlatformVoice_zh-CN-XiaoxiaoNeural": { |
| 146 | + "defaultSsmlProperties": { |
| 147 | + "voiceKind": "PlatformVoice", |
| 148 | + "voiceName": "zh-CN-XiaoxiaoNeural" |
| 149 | + } |
| 150 | + }, |
| 151 | + "PlatformVoice_zh-CN-YunxiNeural": { |
| 152 | + "defaultSsmlProperties": { |
| 153 | + "voiceKind": "PlatformVoice", |
| 154 | + "voiceName": "zh-CN-YunxiNeural" |
| 155 | + } |
| 156 | + }, |
| 157 | + "PlatformVoice_ja-JP-NanamiNeural": { |
| 158 | + "defaultSsmlProperties": { |
| 159 | + "voiceKind": "PlatformVoice", |
| 160 | + "voiceName": "ja-JP-NanamiNeural" |
| 161 | + } |
| 162 | + } |
| 163 | + } |
| 164 | + }, |
| 165 | + "speakerId": "PlatformVoice_zh-CN-XiaoxiaoNeural", |
| 166 | + "translatedText": "中文晓晓" |
| 167 | +} |
| 168 | +
|
| 169 | +00:00:02.510 --> 00:00:04.030 |
| 170 | +{ |
| 171 | + "gender": "Female", |
| 172 | + "speakerId": "PlatformVoice_ja-JP-NanamiNeural", |
| 173 | + "translatedText": "こんにちは、これはサンプルの字幕です", |
| 174 | + "comment": "Keep using jaJP voice." |
| 175 | +} |
| 176 | +
|
| 177 | +00:00:05.010 --> 00:00:07.030 |
| 178 | +{ |
| 179 | + "gender": "Male", |
| 180 | + "speakerId": "PlatformVoice_ja-JP-NanamiNeural", |
| 181 | + "translatedText": "こんにちは、これはサンプルの字幕です", |
| 182 | + "comment": "Not correct gender, will auto be replaced by enUS auto voice selection." |
| 183 | +} |
| 184 | +
|
| 185 | +``` |
| 186 | +--- |
| 187 | + |
| 188 | +You should receive a response body in the following format: |
| 189 | + |
| 190 | +```json |
| 191 | +{ |
| 192 | + "input": { |
| 193 | + "speakerCount": 1, |
| 194 | + "subtitleMaxCharCountPerSegment": 30, |
| 195 | + "exportSubtitleInVideo": true |
| 196 | + }, |
| 197 | + "status": "Not Started", |
| 198 | + "lastActionDateTime": "2024-09-20T06:31:53.760Z", |
| 199 | + "id": "firstiteration0920", |
| 200 | + "createdDateTime": "2024-09-20T06:31:53.760Z" |
| 201 | +} |
| 202 | +``` |
| 203 | + |
| 204 | +You can use `operationId` you specified and call the [Get operation by operation ID](#get-operation-by-operation-id) API periodically until the returned status is `Succeeded` or `Failed`. This operation allows you to monitor the progress of your creating the iteration process. |
| 205 | + |
| 206 | +## Get an iteration by iteration ID |
| 207 | + |
| 208 | +To retrieve details of a specific iteration by its ID, use the HTTP GET request. Replace `[YourResourceKey]` with your Speech resource key, `[YourSpeechRegion]` with your Speech resource region, `[translationId]` with the translation ID you want to check, and `[iterationId]` with the iteration ID you want to check. |
| 209 | + |
| 210 | +```azurecli-interactive |
| 211 | +curl -v -X GET -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]/iterations/[iterationId]?api-version=2024-05-20-preview" |
| 212 | +``` |
| 213 | + |
| 214 | +You should receive a response body in the following format: |
| 215 | + |
| 216 | +```json |
| 217 | +{ |
| 218 | + "input": { |
| 219 | + "speaker Count": 1, |
| 220 | + "subtitleMaxCharCountPerSegment": 30, |
| 221 | + "exportSubtitleInVideo": true |
| 222 | + }, |
| 223 | + "result": { |
| 224 | + "translatedVideoFileUrl": "https://xxx.blob.core.windows.net/container1/video.mp4?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx", |
| 225 | + "sourceLocaleSubtitleWebvttFileUrl": "https://xxx.blob.core.windows.net/container1/sourceLocale.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx", |
| 226 | + "targetLocaleSubtitleWebvttFileUrl": "https://xxx.blob.core.windows.net/container1/targetLocale.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx", |
| 227 | + "metadataJsonWebvttFileUrl": "https://xxx.blob.core.windows.net/container1/metadataJsonLocale.vtt?sv=2023-01-03&st=2024-05-20T08%3A27%3A15Z&se=2024-05-21T08%3A27%3A15Z&sr=b&sp=r&sig=xxx" |
| 228 | + }, |
| 229 | + "status": "Succeeded", |
| 230 | + "lastActionDateTime": "2024-09-20T06:32:59.933Z", |
| 231 | + "id": "firstiteration0920", |
| 232 | + "createdDateTime": "2024-09-20T06:31:53.760Z" |
| 233 | +} |
| 234 | +``` |
| 235 | + |
| 236 | +## Delete a translation by translation ID |
| 237 | + |
| 238 | +Remove a specific translation identified by `translationId`. This operation also removes all iterations associated with this translation. Replace `[YourResourceKey]` with your Speech resource key, `[YourSpeechRegion]` with your Speech resource region, and `[translationId]` with the translation ID you want to delete. |
| 239 | + |
| 240 | +```azurecli-interactive |
| 241 | +curl -v -X DELETE -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]?api-version=2024-05-20-preview" |
| 242 | +``` |
| 243 | + |
| 244 | +The response headers include `HTTP/1.1 204 No Content` if the delete request was successful. |
| 245 | + |
| 246 | +## Additional information |
| 247 | + |
| 248 | +This section provides curl commands for other API calls that aren't described in detail above. You can explore each API using the following commands. |
| 249 | + |
| 250 | +### List translations |
| 251 | + |
| 252 | +To list all video translations that have been uploaded and processed in your resource account, make an HTTP GET request as shown in the following example. Replace `YourResourceKey` with your Speech resource key and replace `YourSpeechRegion` with your Speech resource region. |
| 253 | + |
| 254 | +```azurecli-interactive |
| 255 | +curl -v -X GET -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations?api-version=2024-05-20-preview" |
| 256 | +``` |
| 257 | + |
| 258 | +### Get a translation by translation ID |
| 259 | + |
| 260 | +This operation retrieves detailed information about a specific translation, identified by its unique `translationId`. Replace `[YourResourceKey]` with your Speech resource key, `[YourSpeechRegion]` with your Speech resource region, and `[translationId]` with the translation ID you want to check. |
| 261 | + |
| 262 | +```azurecli-interactive |
| 263 | +curl -v -X GET -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]?api-version=2024-05-20-preview" |
| 264 | +``` |
| 265 | + |
| 266 | +### List iterations |
| 267 | + |
| 268 | +List all iterations for a specific translation. This request lists all iterations without detailed information. Replace `[YourResourceKey]` with your Speech resource key, `[YourSpeechRegion]` with your Speech resource region, and `[translationId]` with the translation ID you want to check. |
| 269 | + |
| 270 | +```azurecli-interactive |
| 271 | +curl -v -X GET -H "Ocp-Apim-Subscription-Key: [YourResourceKey]" "https://[YourSpeechRegion].api.cognitive.microsoft.com/videotranslation/translations/[translationId]/iterations?api-version=2024-05-20-preview" |
| 272 | +``` |
| 273 | + |
| 274 | +## HTTP status codes |
| 275 | + |
| 276 | +The section details the HTTP response codes and messages from the video translation REST API. |
| 277 | + |
| 278 | +### HTTP 200 OK |
| 279 | + |
| 280 | +HTTP 200 OK indicates that the request was successful. |
| 281 | + |
| 282 | +### HTTP 204 error |
| 283 | + |
| 284 | +An HTTP 204 error indicates that the request was successful, but the resource doesn't exist. For example: |
| 285 | + |
| 286 | +- You tried to get or delete a translation that doesn't exist. |
| 287 | +- You successfully deleted a translation. |
| 288 | + |
| 289 | +### HTTP 400 error |
| 290 | + |
| 291 | +Here are examples that can result in the 400 error: |
| 292 | + |
| 293 | +- The source or target locale you specified isn't among the [supported locales](language-support.md?tabs=speech-translation#video-translation). |
| 294 | +- You tried to use a _F0_ Speech resource, but the region only supports the _Standard_ Speech resource pricing tier. |
| 295 | + |
| 296 | +### HTTP 500 error |
| 297 | + |
| 298 | +HTTP 500 Internal Server Error indicates that the request failed. The response body contains the error message. |
| 299 | + |
| 300 | +## Related content |
| 301 | + |
| 302 | +- [Video translation overview](../../../video-translation-overview.md) |
0 commit comments