Merge pull request #19 from sally-baolian/sally-baolian-patch-4

sally-baolian · web-flow · commit 3ec8469ec8c4 · 2023-02-02T13:19:09.000+08:00
Update speech-synthesis-markup-voice.md
diff --git a/articles/cognitive-services/Speech-Service/speech-synthesis-markup-structure.md b/articles/cognitive-services/Speech-Service/speech-synthesis-markup-structure.md
@@ -180,34 +180,6 @@ A good place to start is by trying out the slew of educational apps that are hel
 </speak>
 ```
 
-## Control audio duration
-
-Use the `mstts:audioduration` element to set the length of the output audio. The audio duration setting is applied to all input text within its enclosing `voice` element. To reset or change the duration setting again, you must use a new `voice` element with either the same voice or a different voice. 
-
-The expected audio length should be within 0.5 to 2 times the original audio. If the expected length exceeds the limit, the `mstts:audioduration` element will not control the length of the output audio as you expect. So before using this element, we recommend that you first estimate the length of the original audio based on your script text. To do this, you can refer to the [voice list](rest-text-to-speech.md#get-a-list-of-voices) and check the "WordsPerMinute" attribute of the voice being used. If the estimated length of the original audio exceeds the limit, it's best to adjust your script text to meet the limit before using this element. For example, if the estimated length of your original audio is around 10s and your expected length is 30s, you should first adjust your script to ensure that your original audio is no less than 15s and no more than 60s.
-
-Usage of the `mstts:audioduration` element's attributes are described in the following table.
-
-| Attribute | Description | Required or optional |
-| ---------- | ---------- | ---------- |
-| `value` | The duration of the output audio in seconds (such as `2s`) or milliseconds (such as `500ms`). Audio duration can be applied at the voice level. The value should be within 0.5 to 2 times the original audio.| Required |
-
-###  mstts audio duration examples
-
-The supported values for attributes of the `mstts:audioduration` element were [described previously](#control-audio-duration).
-
-In this example, the original audio is around 15s. The `mstts:audioduration` element is used to set the audio duration to 20s.
-
-```xml
-<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US">
-<voice name="en-US-JennyNeural">
-<mstts:audioduration value="20s"/>
-If we're home schooling, the best we can do is roll with what each day brings and try to have fun along the way.
-A good place to start is by trying out the slew of educational apps that are helping children stay happy and smash their schooling at the same time.
-</voice>
-</speak>
-```
-
 ## Specify paragraphs and sentences
 
 The `p` and `s` elements are used to denote paragraphs and sentences, respectively. In the absence of these elements, the Speech service automatically determines the structure of the SSML document.
diff --git a/articles/cognitive-services/Speech-Service/speech-synthesis-markup-voice.md b/articles/cognitive-services/Speech-Service/speech-synthesis-markup-voice.md
@@ -406,6 +406,34 @@ This SSML snippet illustrates how the `src` attribute is used to insert audio fr
 </speak>
 ```
 
+## Audio duration
+
+Use the `mstts:audioduration` element to set the duration of the output audio. Use this element to help synchronize the timing of audio output completion. The audio duration can be decreased or increased between 0.5 to 2 times the rate of the original audio. The original audio here is the audio without any other rate settings. The speaking rate will be slowed down or sped up accordingly based on the set value. 
+
+The audio duration setting is applied to all input text within its enclosing `voice` element. To reset or change the audio duration setting again, you must use a new `voice` element with either the same voice or a different voice.
+
+Usage of the `mstts:audioduration` element's attributes are described in the following table.
+
+| Attribute | Description | Required or optional |
+| ---------- | ---------- | ---------- |
+| `value` | The requested duration of the output audio in either seconds (such as `2s`) or milliseconds (such as `2000ms`).<br/><br/>This value should be within 0.5 to 2 times the original audio without any other rate settings. For example, if the requested duration of your audio is `30s`, then the original audio must have otherwise been between 15 and 60 seconds. If you set a value outside of these boundaries, the duration is set according to the respective minimum or maximum multiple.<br/><br/>Given your requested output audio duration, the Speech service adjusts the speaking rate accordingly. Use the [voice list](rest-text-to-speech.md#get-a-list-of-voices) API and check the `WordsPerMinute` attribute to find out the speaking rate of the neural voice that you're using. You can divide the number of words in your input text by the value of the `WordsPerMinute` attribute to get the approximate original output audio duration. The output audio will sound most natural when you set the audio duration closest to the estimated duration.| Required |
+
+###  mstts audio duration examples
+
+The supported values for attributes of the `mstts:audioduration` element were [described previously](#audio-duration).
+
+In this example, the original audio is around 15 seconds. The `mstts:audioduration` element is used to set the audio duration to 20 seconds (`20s`).
+
+```xml
+<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US">
+<voice name="en-US-JennyNeural">
+<mstts:audioduration value="20s"/>
+If we're home schooling, the best we can do is roll with what each day brings and try to have fun along the way.
+A good place to start is by trying out the slew of educational apps that are helping children stay happy and smash their schooling at the same time.
+</voice>
+</speak>
+```
+
 ## Background audio
 
 You can use the `mstts:backgroundaudio` element to add background audio to your SSML documents or mix an audio file with text-to-speech. With `mstts:backgroundaudio`, you can loop an audio file in the background, fade in at the beginning of text-to-speech, and fade out at the end of text-to-speech.