Skip to content

Commit b55fade

Browse files
committed
add video overview
1 parent e4fde46 commit b55fade

File tree

1 file changed

+26
-28
lines changed
  • articles/ai-services/content-understanding/video

1 file changed

+26
-28
lines changed

articles/ai-services/content-understanding/video/overview.md

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -63,41 +63,41 @@ With the prebuilt video analyzer (prebuilt-videoAnalyzer), you can upload a vide
6363

6464
* Next, analyzing a 30-second advertising video, would result in the following output:
6565

66-
```markdown
67-
# Video: 00:00.000 => 00:30.000
68-
Width: 1280
69-
Height: 720
66+
```markdown
67+
# Video: 00:00.000 => 00:30.000
68+
Width: 1280
69+
Height: 720
7070

71-
## Segment 1: 00:00.000 => 00:06.000
72-
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
71+
## Segment 1: 00:00.000 => 00:06.000
72+
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
7373

74-
Transcript
74+
Transcript
7575

76-
WEBVTT
76+
WEBVTT
7777

78-
00:03.600 --> 00:06.000
79-
<Speaker 1 Speaker>Get new years ready.
78+
00:03.600 --> 00:06.000
79+
<Speaker 1 Speaker>Get new years ready.
8080

81-
Key Frames
82-
- 00:00.600 ![](keyFrame.600.jpg)
83-
- 00:01.200 ![](keyFrame.1200.jpg)
81+
Key Frames
82+
- 00:00.600 ![](keyFrame.600.jpg)
83+
- 00:01.200 ![](keyFrame.1200.jpg)
8484

85-
## Segment 2: 00:06.000 => 00:10.080
86-
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
85+
## Segment 2: 00:06.000 => 00:10.080
86+
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
8787

88-
Transcript
88+
Transcript
8989

90-
WEBVTT
90+
WEBVTT
9191

92-
00:03.600 --> 00:06.000
93-
<Speaker 1 Speaker>Go team!
92+
00:03.600 --> 00:06.000
93+
<Speaker 1 Speaker>Go team!
9494

95-
Key Frames
96-
- 00:06.200 ![](keyFrame.6200.jpg)
97-
- 00:07.080 ![](keyFrame.7080.jpg)
98-
99-
*…additional data omitted for brevity…*
100-
```
95+
Key Frames
96+
- 00:06.200 ![](keyFrame.6200.jpg)
97+
- 00:07.080 ![](keyFrame.7080.jpg)
98+
99+
*…additional data omitted for brevity…*
100+
```
101101

102102
## Walk-through
103103

@@ -120,7 +120,7 @@ The service operates in two stages. The first stage, content extraction, involve
120120

121121
The first pass is all about extracting a first set of details—who's speaking, where are the cuts, and which faces recur. It creates a solid metadata backbone that later steps can reason over.
122122

123-
* **Transcription:** Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Sentence-level timestamps are available if `"returnDetails": true` is set. Content Understanding supports the full set of Azure AI Speech speech-to-text languages. For more information on supported languages, *see* [Language and region support](../language-region-support.md#language-support). The following transcription details are important to consider:
123+
* **Transcription:** Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Sentence-level timestamps are available if `"returnDetails": true` is set. Content Understanding supports the full set of Azure AI Speech speech-to-text languages. Details of language support for video are the same as audio, *see* [Audio Language Handling](../audio/overview.md#language-handling) for details. The following transcription details are important to consider:
124124

125125
* **Diarization:** Distinguishes between speakers in a conversation in the output, attributing parts of the transcript to specific speakers.
126126
* **Multilingual transcription:** Generates multilingual transcripts. Language/locale is applied per phrase in the transcript. Phrases output when `"returnDetails": true` is set. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
@@ -178,8 +178,6 @@ Shape the output to match your business vocabulary. Use a `fieldSchema` object w
178178
}
179179
```
180180

181-
182-
183181
### Segmentation mode
184182

185183
> [!NOTE]

0 commit comments

Comments
 (0)