You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* This quickstart shows you how to use the [Content Understanding REST API](/rest/api/contentunderstanding/operation-groups?view=rest-contentunderstanding-2025-05-01-preview&preserve-view=true) to get structured data from multimodal content in document, image, audio, and video files.
16
16
17
-
* Try the [Content Understanding API with no code on Azure AI Foundry](https://ai.azure.com/explore/aiservices/vision/contentunderstanding)
17
+
* Try [Content Understanding with no code on Azure AI Foundry](https://ai.azure.com/explore/aiservices/vision/contentunderstanding)
18
18
19
19
## Prerequisites
20
20
@@ -113,20 +113,20 @@ The 200 (`OK`) JSON response includes a `status` field indicating the status of
113
113
114
114
```json
115
115
{
116
-
"id": "a8ccf3ea-e4ad-4302-9ac5-b40e69768053",
116
+
"id": {resultId},
117
117
"status": "Succeeded",
118
118
"result": {
119
119
"analyzerId": "prebuilt-documentAnalyzer",
120
120
"apiVersion": "2025-05-01-preview",
121
-
"createdAt": "2025-05-05T17:55:35Z",
121
+
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
122
122
"warnings": [],
123
123
"contents": [
124
124
{
125
-
"markdown": "# WEB HOSTING AGREEMENT\n\nThis web Hosting Agreement is entered as of this 15 day of October ...",
"valueString": "This document is a Web Hosting Agreement between Contoso Corporation and AdventureWorks Cycles, both based in Washington. It outlines the terms of their agreement, including payment terms for software and bandwidth usage, technical support requirements, and governing laws. The agreement nullifies any previous agreements between the parties and is signed by representatives of both companies."
129
+
"valueString": "This document is an invoice issued by Contoso Ltd. to Microsoft Corporation for services rendered during the period of 10/14/2019 to 11/14/2019..."
130
130
}
131
131
},
132
132
"kind": "document",
@@ -136,10 +136,66 @@ The 200 (`OK`) JSON response includes a `status` field indicating the status of
@@ -187,20 +243,41 @@ The 200 (`OK`) JSON response includes a `status` field indicating the status of
187
243
188
244
```json
189
245
{
190
-
"id": "2e77aecc-b5f0-4652-b91c-4790b655ce01",
246
+
"id": {resultId},
191
247
"status": "Succeeded",
192
248
"result": {
193
249
"analyzerId": "prebuilt-audioAnalyzer",
194
250
"apiVersion": "2025-05-01-preview",
195
-
"createdAt": "2025-05-05T18:06:24Z",
251
+
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
196
252
"stringEncoding": "utf8",
197
253
"warnings": [],
198
254
"contents": [
199
255
{
200
-
"markdown": "# Audio: 00:00.000 => 01:54.670\n\nTranscript\n```\nWEBVTT\n\n00:00.080 --> 00:02.160\n<v Speaker 1>Thank you for calling Woodgrove Travel.\n\n00:02.960 --> 00:04.560\n<v Speaker 1>My name is Isabella Taylor ...",
256
+
"markdown": "# Audio: 00:00.000 => 01:54.670\n\nTranscript\n```\nWEBVTT\n\n00:00.080 --> 00:02.160\n<v Speaker 1>Thank you for calling Woodgrove Travel...",
257
+
"fields": {
258
+
"Summary": {
259
+
"type": "string",
260
+
"valueString": "John Smith contacted Woodgrove Travel to report a negative experience with his flight from New York City to Los Angeles..."
261
+
}
262
+
},
201
263
"kind": "audioVisual",
202
264
"startTimeMs": 0,
203
-
"endTimeMs": 114670
265
+
"endTimeMs": 114670,
266
+
"transcriptPhrases": [
267
+
{
268
+
"speaker": "Speaker 1",
269
+
"startTimeMs": 80,
270
+
"endTimeMs": 2160,
271
+
"text": "Thank you for calling Woodgrove Travel.",
272
+
"words": [
273
+
{
274
+
"startTimeMs": 80,
275
+
"endTimeMs": 280,
276
+
"text": "Thank"
277
+
}, ...
278
+
]
279
+
}, ...
280
+
]
204
281
}
205
282
]
206
283
}
@@ -211,21 +288,56 @@ The 200 (`OK`) JSON response includes a `status` field indicating the status of
211
288
212
289
```json
213
290
{
214
-
"id": "3fb3cca1-4cf1-4f2f-9155-8d1db4ef9541",
291
+
"id": {resultId},
215
292
"status": "Succeeded",
216
293
"result": {
217
294
"analyzerId": "prebuilt-videoAnalyzer",
218
295
"apiVersion": "2025-05-01-preview",
219
-
"createdAt": "2025-05-05T18:24:03Z",
296
+
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
220
297
"warnings": [],
221
298
"contents": [
222
299
{
223
-
"markdown": "# Video: 00:00.000 => 00:43.866\nWidth: 1080\nHeight: 608\n\nTranscript\n```\nWEBVTT\n\n00:01.400 --> 00:06.560\n<Speaker 1 Speaker>When it comes to the neural TTS, in order to get a good voice, it's better to have good data ..."
300
+
"markdown": "# Video: 00:00.000 => 00:43.866\nWidth: 1080\nHeight: 608\n\n## Segment 1: 00:00.000 => 00:07.367\nThe video begins with a scenic aerial view featuring the Flight Simulator and Microsoft Azure AI logos...\n\nTranscript\n```\nWEBVTT\n\n00:01.400 --> 00:06.560\n<Speaker 1 Speaker>When it comes to the neural TTS, in order to get a good voice, it's better to have good data.\n```\n\nKey Frames\n- 00:00.726 ...",
301
+
"fields": {
302
+
"Segments": {
303
+
"type": "array",
304
+
"valueArray": [
305
+
{
306
+
"type": "object",
307
+
"valueObject": {
308
+
"SegmentId": {
309
+
"type": "string",
310
+
"valueString": "1"
311
+
}
312
+
}
313
+
}, ...
314
+
]
315
+
}
316
+
},
224
317
"kind": "audioVisual",
225
318
"startTimeMs": 0,
226
319
"endTimeMs": 43866,
227
320
"width": 1080,
228
-
"height": 608
321
+
"height": 608,
322
+
"KeyFrameTimesMs": [ 726, 2046, ... ],
323
+
"transcriptPhrases": [
324
+
{
325
+
"speaker": "Speaker 1",
326
+
"startTimeMs": 1400,
327
+
"endTimeMs": 6560,
328
+
"text": "When it comes to the neural TTS, in order to get a good voice, it's better to have good data.",
329
+
"words": []
330
+
}, ...
331
+
],
332
+
"cameraShotTimesMs": [ 1467, 3233, ... ],
333
+
"segments": [
334
+
{
335
+
"startTimeMs": 0,
336
+
"endTimeMs": 7367,
337
+
"description": "The video begins with a scenic aerial view featuring the Flight Simulator and Microsoft Azure AI logos...",
0 commit comments