Skip to content

Commit 361715e

Browse files
Update speech-synthesis-markup-structure.md
1 parent 713da63 commit 361715e

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup-structure.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Here's a subset of the basic structure and syntax of an SSML document:
3939
<math xmlns="http://www.w3.org/1998/Math/MathML"></math>
4040
<mstts:express-as style="string" styledegree="value" role="string"></mstts:express-as>
4141
<mstts:silence type="string" value="string"/>
42+
<mstts:audioduration value="string"/>
4243
<mstts:viseme type="string"/>
4344
<p></p>
4445
<phoneme alphabet="string" ph="string"></phoneme>
@@ -61,6 +62,7 @@ Some examples of contents that are allowed in each element are described in the
6162
- `mstts:backgroundaudio`: This element can't contain text or any other elements.
6263
- `mstts:express-as`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`.
6364
- `mstts:silence`: This element can't contain text or any other elements.
65+
- `mstts:audioduration`: This element can't contain text or any other elements.
6466
- `mstts:viseme`: This element can't contain text or any other elements.
6567
- `p`: This element can contain text and the following elements: `audio`, `break`, `phoneme`, `prosody`, `say-as`, `sub`, `mstts:express-as`, and `s`.
6668
- `phoneme`: This element can only contain text and no other elements.
@@ -178,6 +180,34 @@ A good place to start is by trying out the slew of educational apps that are hel
178180
</speak>
179181
```
180182

183+
## Control audio duration
184+
185+
Use the `mstts:audioduration` element to set the length of output audio. Before using this element, we recommend that you first estimate the length of the original audio based on your script text. To do this, you can refer to the [voice list](rest-text-to-speech.md#get-a-list-of-voices) and check the "WordsPerMinute" attribute of the voice being used. The expected audio length should be within 0.5 to 2 times the original audio. If the expected length exceeds the limit, the `mstts:audioduration` element will not control the length of the output audio as you expect. In such a case, to enable this element effectively, it would be best to adjust your script text to meet the limit before using this element. For example, if the length of your original audio is around 10s and your expected length is 30s, you should first adjust your script to ensure that your original audio is no less than 15s.
186+
187+
The audio duration setting is applied to all input text within its enclosing `voice` element. To reset or change the duration setting again, you must use a new `voice` element with either the same voice or a different voice.
188+
189+
Usage of the `mstts:audioduration` element's attributes are described in the following table.
190+
191+
| Attribute | Description | Required or optional |
192+
| ---------- | ---------- | ---------- |
193+
| `value` | The duration of the output audio in seconds (such as `2s`) or milliseconds (such as `500ms`). Audio duration can be applied at the voice level. The value should be within 0.5 to 2 times the original audio.| Required |
194+
195+
### mstts audio duration examples
196+
197+
The supported values for attributes of the `mstts:audioduration` element were [described previously](#control-audio-duration).
198+
199+
In this example, `mstts:audioduration` is used to set the audio duration to 30s.
200+
201+
```xml
202+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US">
203+
<voice name="en-US-JennyNeural">
204+
<mstts:audioduration value="30s"/>
205+
If we're home schooling, the best we can do is roll with what each day brings and try to have fun along the way.
206+
A good place to start is by trying out the slew of educational apps that are helping children stay happy and smash their schooling at the same time.
207+
</voice>
208+
</speak>
209+
```
210+
181211
## Specify paragraphs and sentences
182212

183213
The `p` and `s` elements are used to denote paragraphs and sentences, respectively. In the absence of these elements, the Speech service automatically determines the structure of the SSML document.

0 commit comments

Comments
 (0)