Skip to content

Commit 86decc6

Browse files
authored
Update speech-synthesis-markup.md
1 parent 7d917a3 commit 86decc6

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ Use the `break` element to insert pauses (or breaks) between words, or prevent p
269269

270270
```xml
271271
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
272-
<voice name="en-US-AriaRUS">
272+
<voice name="en-US-AriaNeural">
273273
Welcome to Microsoft Cognitive Services <break time="100ms" /> Text-to-Speech API.
274274
</voice>
275275
</speak>
@@ -464,21 +464,21 @@ Because prosodic attribute values can vary over a wide range, the speech recogni
464464
| Attribute | Description | Required / Optional |
465465
|-----------|-------------|---------------------|
466466
| `pitch` | Indicates the baseline pitch for the text. You may express the pitch as:<ul><li>An absolute value, expressed as a number followed by "Hz" (Hertz). For example, 600 Hz.</li><li>A relative value, expressed as a number preceded by "+" or "-" and followed by "Hz" or "st", that specifies an amount to change the pitch. For example: +80 Hz or -2st. The "st" indicates the change unit is semitone, which is half of a tone (a half step) on the standard diatonic scale.</li><li>A constant value:<ul><li>x-low</li><li>low</li><li>medium</li><li>high</li><li>x-high</li><li>default</li></ul></li></ul>. | Optional |
467-
| `contour` | Contour isn't supported for neural voices. Contour represents changes in pitch. These changes are represented as an array of targets at specified time positions in the speech output. Each target is defined by sets of parameter pairs. For example: <br/><br/>`<prosody contour="(0%,+20Hz) (10%,-2st) (40%,+10Hz)">`<br/><br/>The first value in each set of parameters specifies the location of the pitch change as a percentage of the duration of the text. The second value specifies the amount to raise or lower the pitch, using a relative value or an enumeration value for pitch (see `pitch`). | Optional |
467+
| `contour` |Contour now supports both neural and standard voices. Contour represents changes in pitch. These changes are represented as an array of targets at specified time positions in the speech output. Each target is defined by sets of parameter pairs. For example: <br/><br/>`<prosody contour="(0%,+20Hz) (10%,-2st) (40%,+10Hz)">`<br/><br/>The first value in each set of parameters specifies the location of the pitch change as a percentage of the duration of the text. The second value specifies the amount to raise or lower the pitch, using a relative value or an enumeration value for pitch (see `pitch`). | Optional |
468468
| `range` | A value that represents the range of pitch for the text. You may express `range` using the same absolute values, relative values, or enumeration values used to describe `pitch`. | Optional |
469469
| `rate` | Indicates the speaking rate of the text. You may express `rate` as:<ul><li>A relative value, expressed as a number that acts as a multiplier of the default. For example, a value of *1* results in no change in the rate. A value of *0.5* results in a halving of the rate. A value of *3* results in a tripling of the rate.</li><li>A constant value:<ul><li>x-slow</li><li>slow</li><li>medium</li><li>fast</li><li>x-fast</li><li>default</li></ul></li></ul> | Optional |
470470
| `duration` | The period of time that should elapse while the speech synthesis (TTS) service reads the text, in seconds or milliseconds. For example, *2s* or *1800ms*. | Optional |
471471
| `volume` | Indicates the volume level of the speaking voice. You may express the volume as:<ul><li>An absolute value, expressed as a number in the range of 0.0 to 100.0, from *quietest* to *loudest*. For example, 75. The default is 100.0.</li><li>A relative value, expressed as a number preceded by "+" or "-" that specifies an amount to change the volume. For example, +10 or -5.5.</li><li>A constant value:<ul><li>silent</li><li>x-soft</li><li>soft</li><li>medium</li><li>loud</li><li>x-loud</li><li>default</li></ul></li></ul> | Optional |
472472

473473
### Change speaking rate
474474

475-
Speaking rate can be applied to standard voices at the word or sentence-level. Whereas speaking rate can only be applied to neural voices at the sentence level.
475+
Speaking rate can be applied to Neural voices and standard voices at the word or sentence-level.
476476

477477
**Example**
478478

479479
```xml
480480
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
481-
<voice name="en-US-Guy24kRUS">
481+
<voice name="en-US-GuyNeural">
482482
<prosody rate="+30.00%">
483483
Welcome to Microsoft Cognitive Services Text-to-Speech API.
484484
</prosody>
@@ -519,15 +519,15 @@ Pitch changes can be applied to standard voices at the word or sentence-level. W
519519
### Change pitch contour
520520

521521
> [!IMPORTANT]
522-
> Pitch contour changes aren't supported with neural voices.
522+
> Pitch contour changes are now supported with neural voices.
523523
524524
**Example**
525525

526526
```xml
527527
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
528-
<voice name="en-US-AriaRUS">
529-
<prosody contour="(80%,+20%) (90%,+30%)" >
530-
Good morning.
528+
<voice name="en-US-AriaNeural">
529+
<prosody contour="(60%,-60%) (100%,+80%)" >
530+
Were you the only person in the room?
531531
</prosody>
532532
</voice>
533533
</speak>

0 commit comments

Comments
 (0)