You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md
+23-1Lines changed: 23 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@ Gesture video clips are optional, and customers who have the need to insert cert
113
113
114
114
**Gesture tips:**
115
115
- Each gesture clip should be within 10 seconds.
116
-
- Gestures should start from status 0 and end with status 0; otherwise, the gestureclip can't be smoothly inserted into the avatar video.
116
+
- Gestures should start from status 0 and end with status 0. It's essential that the character maintains the same position as in status 0, which is in the middle of the screen, throughout the gesture. Otherwise, the gesture clip can't be smoothly inserted into the avatar video.
117
117
- The gesture clip only captures the body gestures; the actor doesn't have to speak during making gestures.
118
118
- We recommend designing a list of gestures before recording; here are some examples of gesture video clips:
119
119
@@ -132,6 +132,28 @@ High-quality avatar models are built from high-quality video recordings, includi
132
132
|---------|--------------|
133
133
| - Ensure all video clips are taken in the same conditions.</br>- During the recording process, design the size and display area of the character you need so that the character can be displayed on the screen appropriately.</br> - Actor should be steady during the recording. </br> - Mind facial expressions, which should be suitable for the avatar's use case. For example, look positive and smile if the custom text to speech avatar is used as customer service. Look professionally if the avatar is used for news reporting.</br> - Maintain eye gaze towards the camera, even when using a teleprompter.</br> - Return your body to status 0 when pausing speaking.</br> - Speak on a self-chosen topic, and minor speech mistakes like miss a word or mispronounced are acceptable. If the actor misses a word or mispronounces something, just go back to status 0, pause for 3 seconds, and then continue speaking.</br> - Consciously pause between sentences and paragraphs. When pausing, go back to the status 0 and close your lips. </br> - The audio should be clear and loud enough; bad audio quality impacts training result.</br> - Keep the shooting environment quiet. | - Don't adjust the camera parameters, focal length, position, angle of view. Don't move the camera; keep the person's position, size, angle, consistent in the camera.</br> - Characters that are too small might lead to a loss of image quality during post-processing. Characters that are too large might cause the screen to overflow during gestures and movements.</br> - Don't make too long gestures or too much movement for one gesture; for example, actor's hands are always making gestures and forget to go back to status 0.</br> - The actor's movements and gestures must not block the face.</br> - Avoid small movements of the actor like licking lips, touching hair, talking sideways, constant head shaking during speech, and not closing up after speaking.</br> - Avoid background noise; staff should avoid walking and talking during video recording.</br> - Avoid other people's voice recorded during the actor speaking. |
134
134
135
+
### How to prepare an interaction video clip
136
+
137
+
Creating a high-quality interaction video clip is essential if you're building a real-time conversation with a custom avatar. The clip should consist of a question-and-answer format, where a photographer asks a question, and the actor responds. Loop the question-answer pair until the conversation is complete. If you're filming alone, imagine someone else asking the questions during the asking phase.
138
+
139
+
Here are some tips for each phase:
140
+
141
+
**Asking phase:**
142
+
- Maintain status 0, don't speak, but still feel relaxed.
143
+
- Even remaining in status 0, don't keep still. Perform like you're waiting.
144
+
- Maintain a smile as if listening or waiting patiently.
145
+
- Avoid nodding frequently.
146
+
- Length: Each asking slot should last around 3–5 seconds.
147
+
148
+
**Answering phase:**
149
+
- Speak naturally with natural hand gestures from time to time.
150
+
- Use natural and common gestures when speaking. Avoid meaningful gestures like pointing, applause, or thumbs up.
151
+
- Begin gestures after starting to speak, and stop them before you finish.
152
+
- Length: Each answering slot should last around 5 seconds.
153
+
154
+
**Total video length:**
155
+
- Aim for a total video length of 1–5 minutes.
156
+
135
157
## Data requirements
136
158
137
159
Doing some basic processing of your video data is helpful for model training efficiency, such as:
0 commit comments