You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/video/overview.md
+26-28Lines changed: 26 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,41 +63,41 @@ With the prebuilt video analyzer (prebuilt-videoAnalyzer), you can upload a vide
63
63
64
64
* Next, analyzing a 30-second advertising video, would result in the following output:
65
65
66
-
```markdown
67
-
# Video: 00:00.000 => 00:30.000
68
-
Width: 1280
69
-
Height: 720
66
+
```markdown
67
+
# Video: 00:00.000 => 00:30.000
68
+
Width: 1280
69
+
Height: 720
70
70
71
-
## Segment 1: 00:00.000 => 00:06.000
72
-
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
71
+
## Segment 1: 00:00.000 => 00:06.000
72
+
A lively room filled with people is shown, where a group of friends is gathered around a television. They are watching a sports event, possibly a football match, as indicated by the decorations and the atmosphere.
73
73
74
-
Transcript
74
+
Transcript
75
75
76
-
WEBVTT
76
+
WEBVTT
77
77
78
-
00:03.600 --> 00:06.000
79
-
<Speaker 1 Speaker>Get new years ready.
78
+
00:03.600 --> 00:06.000
79
+
<Speaker 1 Speaker>Get new years ready.
80
80
81
-
Key Frames
82
-
- 00:00.600 
83
-
- 00:01.200 
81
+
Key Frames
82
+
- 00:00.600 
83
+
- 00:01.200 
84
84
85
-
## Segment 2: 00:06.000 => 00:10.080
86
-
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
85
+
## Segment 2: 00:06.000 => 00:10.080
86
+
The scene transitions to a more vibrant and energetic setting, where the group of friends is now celebrating. The room is decorated with football-themed items, and everyone is cheering and enjoying the moment.
87
87
88
-
Transcript
88
+
Transcript
89
89
90
-
WEBVTT
90
+
WEBVTT
91
91
92
-
00:03.600 --> 00:06.000
93
-
<Speaker 1 Speaker>Go team!
92
+
00:03.600 --> 00:06.000
93
+
<Speaker 1 Speaker>Go team!
94
94
95
-
Key Frames
96
-
- 00:06.200 
97
-
- 00:07.080 
98
-
99
-
*…additional data omitted for brevity…*
100
-
```
95
+
Key Frames
96
+
- 00:06.200 
97
+
- 00:07.080 
98
+
99
+
*…additional data omitted for brevity…*
100
+
```
101
101
102
102
## Walk-through
103
103
@@ -120,7 +120,7 @@ The service operates in two stages. The first stage, content extraction, involve
120
120
121
121
The first pass is all about extracting a first set of details—who's speaking, where are the cuts, and which faces recur. It creates a solid metadata backbone that later steps can reason over.
122
122
123
-
***Transcription:** Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Sentence-level timestamps are available if `"returnDetails": true` is set. Content Understanding supports the full set of Azure AI Speech speech-to-text languages. For more information on supported languages, *see*[Language and region support](../language-region-support.md#language-support). The following transcription details are important to consider:
123
+
***Transcription:** Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Sentence-level timestamps are available if `"returnDetails": true` is set. Content Understanding supports the full set of Azure AI Speech speech-to-text languages. Details of language support for video are the same as audio, *see*[Audio Language Handling](../audio/overview.md#language-handling) for details. The following transcription details are important to consider:
124
124
125
125
***Diarization:** Distinguishes between speakers in a conversation in the output, attributing parts of the transcript to specific speakers.
126
126
***Multilingual transcription:** Generates multilingual transcripts. Language/locale is applied per phrase in the transcript. Phrases output when `"returnDetails": true` is set. Deviating from language detection this feature is enabled when no language/locale is specified or language is set to `auto`.
@@ -178,8 +178,6 @@ Shape the output to match your business vocabulary. Use a `fieldSchema` object w
0 commit comments