Skip to content

Commit e4fde46

Browse files
committed
add video overview
1 parent 8ada4a3 commit e4fde46

File tree

1 file changed

+49
-54
lines changed
  • articles/ai-services/content-understanding/video

1 file changed

+49
-54
lines changed

articles/ai-services/content-understanding/video/overview.md

Lines changed: 49 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ The **pre-built video analyzer** outputs RAG-ready Markdown that includes:
3131

3232
This format can drop straight into a vector store to enable an agent or RAG workflows—no post-processing required.
3333

34-
From there you can **customize the analyzer** for more fine-grained control of the output. You can define custom fields, segments, or enable face identification. Customization allows you to use the full power of generative models to extract deep insights from the visual and audio details of the video.
34+
From there you can **customize the analyzer** for more fine-grained control of the output. You can define custom fields, segments, or enable face identification. Customization allows you to use the full power of generative models to extract deep insights from the visual and audio details of the video.
3535

3636
For example, customization allows you to:
3737

@@ -52,61 +52,59 @@ Content understanding for video has broad potential uses. For example, you can c
5252

5353
With the prebuilt video analyzer (prebuilt-videoAnalyzer), you can upload a video and get an immediately usable knowledge asset. The service packages every clip into both richly formatted Markdown and JSON. This process allows your search index or chat agent to ingest without custom glue code.
5454

55-
For example, creating the base prebuilt-videoAnalyzer like this:
55+
* For example, creating the base `prebuilt-videoAnalyzer` as follows:
5656

57-
```
58-
{
59-
"config": {},
60-
"BaseAnalyzerId": "prebuilt-videoAnalyzer",
61-
}
62-
```
57+
```jsonc
58+
{
59+
"config": {},
60+
"BaseAnalyzerId": "prebuilt-videoAnalyzer",
61+
}
62+
```
6363

64-
Then analyzing a 30-second advertising video, would result in the following output:
64+
* Next, analyzing a 30-second advertising video, would result in the following output:
6565

66-
````markdown
67-
# Video: 00:00.000 => 00:30.000
68-
Width: 1280
69-
Height: 720
66+
```markdown
67+
# Video: 00:00.000 => 00:30.000
68+
Width: 1280
69+
Height: 720
7070

71-
## Segment 1: 00:00.000 => 00:06.000
72-
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
71+
## Segment 1: 00:00.000 => 00:06.000
72+
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
7373

74-
Transcript
75-
```
76-
WEBVTT
74+
Transcript
7775

78-
00:03.600 --> 00:06.000
79-
<Speaker 1 Speaker>Get new years ready.
80-
```
76+
WEBVTT
8177

82-
Key Frames
83-
- 00:00.600 ![](keyFrame.600.jpg)
84-
- 00:01.200 ![](keyFrame.1200.jpg)
78+
00:03.600 --> 00:06.000
79+
<Speaker 1 Speaker>Get new years ready.
8580

86-
## Segment 2: 00:06.000 => 00:10.080
87-
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
81+
Key Frames
82+
- 00:00.600 ![](keyFrame.600.jpg)
83+
- 00:01.200 ![](keyFrame.1200.jpg)
8884

89-
Transcript
90-
```
91-
WEBVTT
85+
## Segment 2: 00:06.000 => 00:10.080
86+
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
9287

93-
00:03.600 --> 00:06.000
94-
<Speaker 1 Speaker>Go team!
95-
```
88+
Transcript
9689

97-
Key Frames
98-
- 00:06.200 ![](keyFrame.6200.jpg)
99-
- 00:07.080 ![](keyFrame.7080.jpg)
90+
WEBVTT
10091

101-
*…additional data omitted for brevity…*
102-
````
92+
00:03.600 --> 00:06.000
93+
<Speaker 1 Speaker>Go team!
94+
95+
Key Frames
96+
- 00:06.200 ![](keyFrame.6200.jpg)
97+
- 00:07.080 ![](keyFrame.7080.jpg)
98+
99+
*…additional data omitted for brevity…*
100+
```
103101

104102
## Walk-through
105103

106104
We recently published a walk-through for RAG on Video using Content Understanding.
107105
[https://www.youtube.com/watch?v=fafneWnT2kw\&lc=Ugy2XXFsSlm7PgIsWQt4AaABAg](https://www.youtube.com/watch?v=fafneWnT2kw&lc=Ugy2XXFsSlm7PgIsWQt4AaABAg)
108106

109-
# Capabilities
107+
## Capabilities
110108

111109
1. [Content extraction](#content-extraction-capabilities)
112110
1. [Field extraction](#field-extraction-and-segmentation)
@@ -131,8 +129,8 @@ The first pass is all about extracting a first set of details—who's speaking,
131129
> When Multilingual transcription is used, any files with unsupported locales produce a result based on the closest supported locale, which is likely incorrect. This result is a known
132130
> behavior. Avoid transcription quality issues by ensuring that you configure locales when not using a multilingual transcription supported locale!
133131
134-
* **Key frame extraction:** Extracts key frames from videos to represent each shot completely, ensuring each shot has enough key frames to enable field extraction to work effectively.
135-
* **Shot detection:** Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and repackaging of content with breaks exactly existing edits. This is output as a list of timestamps in milliseconds in `cameraShotTimesMs`. It is only output when `"returnDetails": true` is set.
132+
* **Key frame extraction:** Extracts key frames from videos to represent each shot completely, ensuring each shot has enough key frames to enable field extraction to work effectively.
133+
* **Shot detection:** Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and repackaging of content with breaks exactly existing edits. The output is a list of timestamps in milliseconds in `cameraShotTimesMs`. The output is only returned when `"returnDetails": true` is set.
136134

137135
## Field extraction and segmentation
138136

@@ -157,6 +155,7 @@ Shape the output to match your business vocabulary. Use a `fieldSchema` object w
157155
**Example:**
158156

159157
```jsonc
158+
160159
"fieldSchema": {
161160
"description": "Extract brand presence and sentiment per scene",
162161
"fields": {
@@ -182,6 +181,7 @@ Shape the output to match your business vocabulary. Use a `fieldSchema` object w
182181

183182

184183
### Segmentation mode
184+
185185
> [!NOTE]
186186
>
187187
> Setting segmentation triggers field extraction even if no fields are defined.
@@ -208,13 +208,14 @@ Content Understanding offers three ways to slice a video, letting you get the ou
208208

209209
**Example:**
210210
* Break a news broadcast up into stories.
211-
```jsonc
212-
{
213-
"segmentationMode": "custom",
214-
"segmentationDefinition": "news broadcasts divided by individual stories"
215-
}
216-
```
217-
211+
212+
```jsonc
213+
{
214+
"segmentationMode": "custom",
215+
"segmentationDefinition": "news broadcasts divided by individual stories"
216+
}
217+
```
218+
218219
## Face identification description add-on
219220

220221
> [!NOTE]
@@ -228,7 +229,7 @@ Face identification description is an add-on that provides context to content ex
228229
The face add-on enables grouping and identification as output from the content extraction section. To enable face capabilities set `"enableFace":true` in the analyzer configuration.
229230

230231
* **Grouping:** Grouped faces appearing in a video to extract one representative face image for each person and provides segments where each one is present. The grouped face data is available as metadata and can be used to generate customized metadata fields when `returnDetails: true` for the analyzer.
231-
* **Identification:** Labels individuals in the video with names based on a Face API person directory. Customers can enable this feature by supplying a name for a Face API directory in the current resource in the `personDirectoryId` property of the analyzer. To use this capibility, first you must create a personDirectory then reference it in the analyzer. For details on how to do that, check out [How to build a person directory](../../content-understanding/tutorial/build-person-directory.md)
232+
* **Identification:** Labels individuals in the video with names based on a Face API person directory. Customers can enable this feature by supplying a name for a Face API directory in the current resource in the `personDirectoryId` property of the analyzer. To use this capability, first you must create a personDirectory then reference it in the analyzer. For details on how to do that, check out [How to build a person directory](../../content-understanding/tutorial/build-person-directory.md)
232233

233234
### Field Extraction – Face description
234235

@@ -262,14 +263,10 @@ Specific limitations of video processing to keep in mind:
262263

263264
For supported formats, see [Service quotas and limits](../service-limits.md).
264265

265-
266-
267266
## Supported languages and regions
268267

269268
See [Language and region support](../language-region-support.md).
270269

271-
272-
273270
## Data privacy and security
274271

275272
As with all Azure AI services, review Microsoft's [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) documentation.
@@ -278,8 +275,6 @@ As with all Azure AI services, review Microsoft's [Data, protection, and privacy
278275
>
279276
> If you process **Biometric Data** (for example, enable **Face Grouping** or **Face Identification**), you must meet all notice, consent, and deletion requirements under GDPR or other applicable laws. See [Data and Privacy for Face](/legal/cognitive-services/face/data-privacy-security).
280277
281-
282-
283278
## Next steps
284279
285280
* Process videos in the [Azure AI Foundry portal](https://aka.ms/cu-landing).

0 commit comments

Comments
 (0)