Merge pull request #1726 from bridgeqiqi/patch-1

prmerger-automator[bot] · web-flow · commit 4ec4e0be7ecc · 2024-12-05T06:33:42.000Z
Update custom-avatar-record-video-samples.md
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create.md b/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create.md
@@ -23,6 +23,8 @@ You must provide a video file with a recorded statement from your avatar talent,
 
 You can find the verbal consent statement in multiple languages on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/customavatar/verbal-statement-all-locales.txt). The language of the verbal statement must be the same as your recording. See also the disclosure for voice talent.
 
+For more information about recording the consent video, see [How to record video samples](custom-avatar-record-video-samples.md).
+
 ## Prepare training data for custom text to speech avatar
 
 You're required to provide video recordings of the avatar talent speaking in a language of your choice. The video recordings should contain high signal-to-noise ratio voice. The voice in the video recording isn't used as training data for a custom neural voice; its purpose is to train the custom text to speech avatar model.
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md b/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md
@@ -60,7 +60,14 @@ The custom text to speech avatar doesn't support customization of clothes or loo
 
 ## What video clips to record
 
-You need three types of basic video clips:
+You need four types of basic video clips:
+
+**Consent Video:**
+   - The consent video must represent the same avatar talent speaking, following the requirement of the consent statement. Make sure the statement is correctly recorded, and each word is clearly spoken. [Get consent file from the avatar talent](custom-avatar-create.md#get-consent-file-from-the-avatar-talent). You can select any one of the languages supported. 
+   - The avatar talent should always face the front of the camera, without large movements. 
+   - The video should be taken in a quiet environment, and the voice should be recorded at a reasonable volume. Try to keep the signal-to-noise ratio higher than 20. For voice recording guidance, see the [Recording custom voice samples](../record-custom-voice-samples.md#recording-your-script) guide.
+   - Ensure that the head part will not be occluded in each frame of the video. 
+   - Make sure no other objects appear in the camera, including filming equipment, mobile phone, etc. 
 
 **Status 0 speaking:**
    - Status 0 represents the posture you can naturally maintain most of the time while speaking. For example, arms crossed in front of the body or hanging down naturally at the sides.