You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md
+20-18Lines changed: 20 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: conceptual
9
-
ms.date: 9/12/2024
9
+
ms.date: 1/3/2025
10
10
ms.author: eur
11
11
keywords: how to record video samples for custom text to speech avatar
12
12
---
@@ -60,71 +60,73 @@ The custom text to speech avatar doesn't support customization of clothes or loo
60
60
61
61
## What video clips to record
62
62
63
-
You need four types of basic video clips:
63
+
You need several types of basic video clips:
64
64
65
-
**Consent Video:**
65
+
**Consent Video (Required)**
66
+
The consent video is required for creating a custom avatar.
66
67
- The consent video must represent the same avatar talent speaking, following the requirement of the consent statement. Make sure the statement is correctly recorded, and each word is clearly spoken. [Get consent file from the avatar talent](custom-avatar-create.md#get-consent-file-from-the-avatar-talent). You can select any one of the languages supported.
67
68
- The avatar talent should always face the front of the camera, without large movements.
68
69
- The video should be taken in a quiet environment, and the voice should be recorded at a reasonable volume. Try to keep the signal-to-noise ratio higher than 20. For voice recording guidance, see the [Recording custom voice samples](../record-custom-voice-samples.md#recording-your-script) guide.
69
70
- Ensure that the head part will not be occluded in each frame of the video.
70
71
- Make sure no other objects appear in the camera, including filming equipment, mobile phone, etc.
71
72
72
-
**Status 0 speaking:**
73
+
**Status 0 speaking (Required for gestures)**
74
+
The status 0 speaking video clip is required for gestures with the avatar.
73
75
- Status 0 represents the posture you can naturally maintain most of the time while speaking. For example, arms crossed in front of the body or hanging down naturally at the sides.
74
76
- Maintain a front-facing pose. The actor can move slightly to show a relaxed status, like moving the head or shoulder slightly, but don't move the body too much.
75
77
- Length: keep speaking in status 0 for 3-5 minutes.
76
78
77
-
**Samples of status 0 speaking:**
79
+
**Samples of status 0 speaking**
78
80
79
81

80
82
81
83

82
84
83
85

84
86
85
-
**Naturally speaking:**
87
+
**Naturally speaking (Required)**
88
+
The naturally speaking video clip is required for the avatar to speak naturally.
86
89
- Actor speaks in status 0 but with natural hand gestures from time to time.
87
90
- Hands should start from status 0 and return after making gestures.
88
91
- Use natural and common gestures when speaking. Avoid meaningful gestures like pointing, applause, or thumbs up.
89
92
- Length: Minimum 5 minutes, maximum 30 minutes in total. At least one piece of 5-minute continuous video recording is required. If recording multiple video clips, keep each clip under 10 minutes.
90
93
91
-
**Samples of natural speaking:**
94
+
**Samples of natural speaking**
92
95
93
96

94
97
95
98

96
99
97
100

98
101
99
-
**Silent status:**
100
-
101
-
This video clip is important if you build a real-time conversation with the custom avatar. The video clip is used as the main template for both speaking and listening status for a chatbot.
102
+
**Silent status (Required)**
103
+
The silent status video clip is required. It's important if you build a real-time conversation with the custom avatar. The video clip is used as the main template for both speaking and listening status for a chatbot.
102
104
103
105
- Maintain status 0, don't speak, but still feel relaxed.
104
106
- Even remaining in status 0, don't keep still; you can move slightly but not too much. Perform like you're waiting.
105
107
- Maintain a smile as if listening or waiting patiently.
106
108
- Avoid nodding frequently.
107
109
- Length: 1 minute.
108
110
109
-
**Samples of silent status:**
111
+
**Samples of silent status**
110
112
111
113

112
114
113
115

114
116
115
117

116
118
117
-
**Gestures (optional):**
119
+
**Gestures (optional)**
118
120
119
121
Gesture video clips are optional, and customers who have the need to insert certain gestures in the avatar speaking can follow this guideline to take gesture videos. Gesture insertion is only enabled for batch mode avatar; real-time avatar doesn't support gesture insertion at this point. Each custom avatar model can support no more than 10 gestures.
120
122
121
-
**Gesture tips:**
123
+
**Gesture tips**
122
124
- Each gesture clip should be within 10 seconds.
123
125
- Gestures should start from status 0 and end with status 0. It's essential that the character maintains the same position as in status 0, which is in the middle of the screen, throughout the gesture. Otherwise, the gesture clip can't be smoothly inserted into the avatar video.
124
126
- The gesture clip only captures the body gestures; the actor doesn't have to speak during making gestures.
125
127
- We recommend designing a list of gestures before recording; here are some examples of gesture video clips:
0 commit comments