Skip to content

Commit 4ec4e0b

Browse files
Merge pull request #1726 from bridgeqiqi/patch-1
Update custom-avatar-record-video-samples.md
2 parents d7cfaaa + 87910b4 commit 4ec4e0b

File tree

2 files changed

+10
-1
lines changed

2 files changed

+10
-1
lines changed

articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ You must provide a video file with a recorded statement from your avatar talent,
2323

2424
You can find the verbal consent statement in multiple languages on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/customavatar/verbal-statement-all-locales.txt). The language of the verbal statement must be the same as your recording. See also the disclosure for voice talent.
2525

26+
For more information about recording the consent video, see [How to record video samples](custom-avatar-record-video-samples.md).
27+
2628
## Prepare training data for custom text to speech avatar
2729

2830
You're required to provide video recordings of the avatar talent speaking in a language of your choice. The video recordings should contain high signal-to-noise ratio voice. The voice in the video recording isn't used as training data for a custom neural voice; its purpose is to train the custom text to speech avatar model.

articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,14 @@ The custom text to speech avatar doesn't support customization of clothes or loo
6060

6161
## What video clips to record
6262

63-
You need three types of basic video clips:
63+
You need four types of basic video clips:
64+
65+
**Consent Video:**
66+
- The consent video must represent the same avatar talent speaking, following the requirement of the consent statement. Make sure the statement is correctly recorded, and each word is clearly spoken. [Get consent file from the avatar talent](custom-avatar-create.md#get-consent-file-from-the-avatar-talent). You can select any one of the languages supported.
67+
- The avatar talent should always face the front of the camera, without large movements.
68+
- The video should be taken in a quiet environment, and the voice should be recorded at a reasonable volume. Try to keep the signal-to-noise ratio higher than 20. For voice recording guidance, see the [Recording custom voice samples](../record-custom-voice-samples.md#recording-your-script) guide.
69+
- Ensure that the head part will not be occluded in each frame of the video.
70+
- Make sure no other objects appear in the camera, including filming equipment, mobile phone, etc.
6471

6572
**Status 0 speaking:**
6673
- Status 0 represents the posture you can naturally maintain most of the time while speaking. For example, arms crossed in front of the body or hanging down naturally at the sides.

0 commit comments

Comments
 (0)