Skip to content

Commit 80a3899

Browse files
authored
Merge pull request #108436 from Yueying-Liu/patch-18
Add more styles in the document - live on 3/24
2 parents 50357b7 + 9efc790 commit 80a3899

File tree

1 file changed

+30
-28
lines changed

1 file changed

+30
-28
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Each SSML document is created with SSML elements (or tags). These elements are u
4545
**Syntax**
4646

4747
```xml
48-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="string"></speak>
48+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="string"></speak>
4949
```
5050

5151
**Attributes**
@@ -54,7 +54,7 @@ Each SSML document is created with SSML elements (or tags). These elements are u
5454
|-----------|-------------|---------------------|
5555
| `version` | Indicates the version of the SSML specification used to interpret the document markup. The current version is 1.0. | Required |
5656
| `xml:lang` | Specifies the language of the root document. The value may contain a lowercase, two-letter language code (for example, `en`), or the language code and uppercase country/region (for example, `en-US`). | Required |
57-
| `xmlns` | Specifies the URI to the document that defines the markup vocabulary (the element types and attribute names) of the SSML document. The current URI is https://www.w3.org/2001/10/synthesis. | Required |
57+
| `xmlns` | Specifies the URI to the document that defines the markup vocabulary (the element types and attribute names) of the SSML document. The current URI is http://www.w3.org/2001/10/synthesis. | Required |
5858

5959
## Choose a voice for text-to-speech
6060

@@ -80,7 +80,7 @@ The `voice` element is required. It is used to specify the voice that is used fo
8080
> This example uses the `en-US-AriaRUS` voice. For a complete list of supported voices, see [Language support](language-support.md#text-to-speech).
8181
8282
```XML
83-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
83+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
8484
<voice name="en-US-AriaRUS">
8585
This is the text that is spoken.
8686
</voice>
@@ -171,7 +171,7 @@ speechConfig!.setPropertyTo(
171171
**Example**
172172

173173
```xml
174-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
174+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
175175
<voice name="en-US-AriaRUS">
176176
Good morning!
177177
</voice>
@@ -190,45 +190,47 @@ By default, the text-to-speech service synthesizes text using a neutral speaking
190190

191191
Currently, speaking style adjustments are supported for these neural voices:
192192
* `en-US-AriaNeural`
193-
* `pt-BR-FranciscaNeural`
194193
* `zh-CN-XiaoxiaoNeural`
194+
* `pt-BR-FranciscaNeural`
195195

196196
Changes are applied at the sentence level, and style vary by voice. If a style isn't supported, the service will return speech in the default neutral speaking style.
197197

198198
**Syntax**
199199

200200
```xml
201-
<mstts:express-as type="string"></mstts:express-as>
201+
<mstts:express-as style="string"></mstts:express-as>
202202
```
203203

204204
**Attributes**
205205

206206
| Attribute | Description | Required / Optional |
207207
|-----------|-------------|---------------------|
208-
| `type` | Specifies the speaking style. Currently, speaking styles are voice-specific. | Required if adjusting the speaking style for a neural voice. If using `mstts:express-as`, then type must be provided. If an invalid value is provided, this element will be ignored. |
208+
| `style` | Specifies the speaking style. Currently, speaking styles are voice-specific. | Required if adjusting the speaking style for a neural voice. If using `mstts:express-as`, then style must be provided. If an invalid value is provided, this element will be ignored. |
209209

210210
Use this table to determine which speaking styles are supported for each neural voice.
211211

212-
| Voice | Type | Description |
212+
| Voice | Style | Description |
213213
|-------|------|-------------|
214-
| `en-US-AriaNeural` | `type="cheerful"` | Expresses an emotion that is positive and happy |
215-
| | `type="empathy"` | Expresses a sense of caring and understanding |
216-
| | `type="chat"` | Speak in a casual, relaxed tone |
217-
| | `type="newscast"` | Expresses a formal tone, similar to news broadcasts |
218-
| | `type="customerservice"` | Speak in a friendly and patient way as customer service |
219-
| `pt-BR-FranciscaNeural` | `type="cheerful"` | Expresses an emotion that is positive and happy |
220-
| `zh-CN-XiaoxiaoNeural` | `type="newscast"` | Expresses a formal tone, similar to news broadcasts |
221-
| | `type="sentiment"` | Conveys a touching message or a story |
214+
| `en-US-AriaNeural` | `style="newscast"` | Expresses a formal and professional tone for narrating news |
215+
| | `style="customerservice"` | Expresses a friendly and helpful tone for customer support |
216+
| | `style="chat"` | Expresses a casual and relaxed tone |
217+
| | `style="cheerful"` | Expresses a positive and happy tone |
218+
| | `style="empathetic"` | Expresses a sense of caring and understanding |
219+
| `zh-CN-XiaoxiaoNeural` | `style="newscast"` | Expresses a formal and professional tone for narrating news |
220+
| | `style="customerservice"` | Expresses a friendly and helpful tone for customer support |
221+
| | `style="assistant"` | Expresses a warm and relaxed tone for digital assistants |
222+
| | `style="lyrical"` | Expresses emotions in a melodic and sentimental way |
223+
| `pt-BR-FranciscaNeural` | `style="cheerful"` | Expresses a positive and happy tone |
222224

223225
**Example**
224226

225227
This SSML snippet illustrates how the `<mstts:express-as>` element is used to change the speaking style to `cheerful`.
226228

227229
```xml
228-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis"
230+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
229231
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
230232
<voice name="en-US-AriaNeural">
231-
<mstts:express-as type="cheerful">
233+
<mstts:express-as style="cheerful">
232234
That'd be just amazing!
233235
</mstts:express-as>
234236
</voice>
@@ -269,7 +271,7 @@ Use the `break` element to insert pauses (or breaks) between words, or prevent p
269271
**Example**
270272

271273
```xml
272-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
274+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
273275
<voice name="en-US-AriaRUS">
274276
Welcome to Microsoft Cognitive Services <break time="100ms" /> Text-to-Speech API.
275277
</voice>
@@ -294,7 +296,7 @@ The `s` element may contain text and the following elements: `audio`, `break`, `
294296
**Example**
295297

296298
```XML
297-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
299+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
298300
<voice name="en-US-AriaRUS">
299301
<p>
300302
<s>Introducing the sentence element.</s>
@@ -330,15 +332,15 @@ Phonetic alphabets are composed of phones, which are made up of letters, numbers
330332
**Examples**
331333

332334
```XML
333-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
335+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
334336
<voice name="en-US-AriaRUS">
335337
<s>His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou </phoneme></s>
336338
</voice>
337339
</speak>
338340
```
339341

340342
```xml
341-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
343+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
342344
<voice name="en-US-AriaRUS">
343345
<phoneme alphabet="ipa" ph="t&#x259;mei&#x325;&#x27E;ou&#x325;"> tomato </phoneme>
344346
</voice>
@@ -470,7 +472,7 @@ Speaking rate can be applied to standard voices at the word or sentence-level. W
470472
**Example**
471473

472474
```xml
473-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
475+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
474476
<voice name="en-US-Guy24kRUS">
475477
<prosody rate="+30.00%">
476478
Welcome to Microsoft Cognitive Services Text-to-Speech API.
@@ -486,7 +488,7 @@ Volume changes can be applied to standard voices at the word or sentence-level.
486488
**Example**
487489

488490
```xml
489-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
491+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
490492
<voice name="en-US-AriaRUS">
491493
<prosody volume="+20.00%">
492494
Welcome to Microsoft Cognitive Services Text-to-Speech API.
@@ -502,7 +504,7 @@ Pitch changes can be applied to standard voices at the word or sentence-level. W
502504
**Example**
503505

504506
```xml
505-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
507+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
506508
<voice name="en-US-Guy24kRUS">
507509
Welcome to <prosody pitch="high">Microsoft Cognitive Services Text-to-Speech API.</prosody>
508510
</voice>
@@ -517,7 +519,7 @@ Pitch changes can be applied to standard voices at the word or sentence-level. W
517519
**Example**
518520

519521
```xml
520-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
522+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
521523
<voice name="en-US-AriaRUS">
522524
<prosody contour="(80%,+20%) (90%,+30%)" >
523525
Good morning.
@@ -568,7 +570,7 @@ The `say-as` element may contain only text.
568570
The speech synthesis engine speaks the following example as "Your first request was for one room on October nineteenth twenty ten with early arrival at twelve thirty five PM."
569571

570572
```XML
571-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
573+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
572574
<voice name="en-US-AriaRUS">
573575
<p>
574576
Your <say-as interpret-as="ordinal"> 1st </say-as> request was for <say-as interpret-as="cardinal"> 1 </say-as> room
@@ -606,7 +608,7 @@ Any audio included in the SSML document must meet these requirements:
606608
**Example**
607609

608610
```xml
609-
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
611+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
610612
<voice name="en-US-AriaRUS">
611613
<p>
612614
<audio src="https://contoso.com/opinionprompt.wav"/>

0 commit comments

Comments
 (0)