You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/video/overview.md
+28-24Lines changed: 28 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,11 +31,13 @@ The **pre-built video analyzer** outputs RAG-ready Markdown that includes:
31
31
32
32
This format can drop straight into a vector store to enable an agent or RAG workflows—no post-processing required.
33
33
34
-
From there you can **customize the analyzer** for more fine-grained control of the output. You can define custom fields, segments, or enable face identification. Customization allows you to use the full power of generative models to extract deep insights from the visual and audio details of the video. For example, customization allows you to:
34
+
From there you can **customize the analyzer** for more fine-grained control of the output. You can define custom fields, segments, or enable face identification. Customization allows you to use the full power of generative models to extract deep insights from the visual and audio details of the video.
35
35
36
-
- Identify what products and brands are seen or mentioned in the video.
37
-
- Segment a news broadcast into chapters based on the topics or news stories discussed.
38
-
- Use face identification to label speakers as executives, for example, `CEO John Doe`, `CFO Jane Smith`.
36
+
For example, customization allows you to:
37
+
38
+
-**Define custom fields:** to identify what products and brands are seen or mentioned in the video.
39
+
-**Generate custom segments:** to segment a news broadcast into chapters based on the topics or news stories discussed.
40
+
-**Identify people using a person directory** enabling a customer to label conference speakers in footage using face identification, for example, `CEO John Doe`, `CFO Jane Smith`.
39
41
40
42
## Why use Content Understanding for video?
41
43
@@ -49,50 +51,52 @@ Content understanding for video has broad potential uses. For example, you can c
49
51
## Prebuilt video analyzer example
50
52
51
53
With the prebuilt video analyzer (prebuilt-videoAnalyzer), you can upload a video and get an immediately usable knowledge asset. The service packages every clip into both richly formatted Markdown and JSON. This process allows your search index or chat agent to ingest without custom glue code.
52
-
Calling prebuilt-video with no custom schema returns a document like the following (abridged) example:
53
54
54
-
```markdown
55
+
For example, creating the base prebuilt-videoAnalyzer like this:
56
+
57
+
```
58
+
{
59
+
"config": {},
60
+
"BaseAnalyzerId": "prebuilt-videoAnalyzer",
61
+
}
62
+
```
63
+
64
+
Then analyzing a 30-second advertising video, would result in the following output:
65
+
66
+
````markdown
55
67
# Video: 00:00.000 => 00:30.000
56
68
Width: 1280
57
69
Height: 720
58
70
59
71
## Segment 1: 00:00.000 => 00:06.000
60
-
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere. The AliExpress logo is prominently displayed, suggesting a connection to the ongoing event.
72
+
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
61
73
62
74
Transcript
63
75
```
64
76
WEBVTT
65
77
66
78
00:03.600 --> 00:06.000
67
-
<Speaker 1 Speaker>Get Euro ready with AliExpress.
79
+
<Speaker 1 Speaker>Get new years ready.
68
80
```
69
81
70
82
Key Frames
71
83
- 00:00.600 
72
84
- 00:01.200 
73
-
- 00:02.560 
74
-
- 00:03.280 
75
-
- 00:04.560 
76
-
- 00:05.600 
77
85
78
86
## Segment 2: 00:06.000 => 00:10.080
79
-
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment. The AliExpress branding continues to be visible, emphasizing the theme of shopping and celebration.
87
+
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
80
88
81
89
Transcript
82
90
```
83
91
WEBVTT
84
92
85
93
00:03.600 --> 00:06.000
86
-
<Speaker 1 Speaker>Get Euro ready with AliExpress.
94
+
<Speaker 1 Speaker>Go team!
87
95
```
88
96
89
97
Key Frames
90
98
- 00:06.200 
91
99
- 00:07.080 
92
-
- 00:07.760 
93
-
- 00:08.560 
94
-
- 00:09.360 
95
-
96
100
97
101
*…additional data omitted for brevity…*
98
102
````
@@ -102,7 +106,7 @@ Key Frames
102
106
We recently published a walk-through for RAG on Video using Content Understanding.
@@ -178,6 +182,10 @@ Shape the output to match your business vocabulary. Use a `fieldSchema` object w
178
182
179
183
180
184
### Segmentation mode
185
+
> [!NOTE]
186
+
>
187
+
> Setting segmentation triggers field extraction even if no fields are defined.
188
+
181
189
182
190
Content Understanding offers three ways to slice a video, letting you get the output you need for whole videos or short clips. You can use these options by setting the `SegmentationMode` property on a custom analyzer.
183
191
@@ -206,11 +214,7 @@ Content Understanding offers three ways to slice a video, letting you get the ou
206
214
"segmentationDefinition":"news broadcasts divided by individual stories"
207
215
}
208
216
```
209
-
210
-
> [!NOTE]
211
-
>
212
-
> Setting segmentation triggers field extraction even if no fields are defined.
0 commit comments