Skip to content

Commit adbfe79

Browse files
Merge pull request #222377 from sally-baolian/patch-85
SSML audio effect
2 parents 491d999 + 6165ac1 commit adbfe79

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup-structure.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Here's a subset of the basic structure and syntax of an SSML document:
2929
```xml
3030
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="string">
3131
<mstts:backgroundaudio src="string" volume="string" fadein="string" fadeout="string"/>
32-
<voice name="string">
32+
<voice name="string" effect="string">
3333
<audio src="string"/></audio>
3434
<bookmark mark="string"/>
3535
<break strength="string" time="string" />
@@ -51,7 +51,7 @@ Here's a subset of the basic structure and syntax of an SSML document:
5151
</speak>
5252
```
5353

54-
Some examples of contents that are allowed in each element are described in the following list:
54+
Some examples of contents that are allowed in each element are described in the following list:
5555
- `audio`: The body of the `audio` element can contain plain text or SSML markup that's spoken if the audio file is unavailable or unplayable. The `audio` element can also contain text and the following elements: `audio`, `break`, `p`, `s`, `phoneme`, `prosody`, `say-as`, and `sub`.
5656
- `bookmark`: This element can't contain text or any other elements.
5757
- `break`: This element can't contain text or any other elements.

articles/cognitive-services/Speech-Service/speech-synthesis-markup-voice.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ At least one `voice` element must be specified within each SSML [speak](speech-s
2424

2525
You can include multiple `voice` elements in a single SSML document. Each `voice` element can specify a different voice. You can also use the same voice multiple times with different settings, such as when you [change the silence duration](speech-synthesis-markup-structure.md#add-silence) between sentences.
2626

27+
2728
Usage of the `voice` element's attributes are described in the following table.
2829

2930
| Attribute | Description | Required or optional |
3031
| ---------- | ---------- | ---------- |
3132
| `name` | The voice used for text-to-speech output. For a complete list of supported prebuilt voices, see [Language support](language-support.md?tabs=tts).| Required|
33+
| `effect` |The audio effect processor that's used to optimize the quality of the synthesized speech output for specific scenarios on devices. <br/><br/>For some scenarios in production environments, the auditory experience may be degraded due to the playback distortion on certain devices. For example, the synthesized speech from a car speaker may sound dull and muffled due to environmental factors such as speaker response, room reverberation, and background noise. The passenger might have to turn up the volume to hear more clearly. To avoid manual operations in such a scenario, the audio effect processor can make the sound clearer by compensating the distortion of playback.<br/><br/>The following values are supported:<br/><ul><li>`eq_car` – Optimize the auditory experience when providing high-fidelity speech in cars, buses, and other enclosed automobiles.</li><li>`eq_telecomhp8k` – Optimize the auditory experience for narrowband speech in telecom or telephone scenarios. We recommend a sampling rate of 8 kHz. If the sample rate isn't 8 kHz, the auditory quality of the output speech won't be optimized.</li></ul><br/>If the value is missing or invalid, this attribute will be ignored and no effect will be applied.| Optional |
3234

3335
### Voice examples
3436

@@ -77,6 +79,18 @@ This example uses a custom voice named "my-custom-voice".
7779
</speak>
7880
```
7981

82+
#### Audio effect example
83+
84+
You use the `effect` attribute to optimize the auditory experience for scenarios such as cars and telecommunications. The following SSML example uses the `effect` attribute with the configuration in car scenarios.
85+
86+
```xml
87+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
88+
<voice name="en-US-JennyNeural" effect="eq_car">
89+
This is the text that is spoken.
90+
</voice>
91+
</speak>
92+
```
93+
8094
## Speaking styles and roles
8195

8296
By default, neural voices have a neutral speaking style. You can adjust the speaking style, style degree, and role at the sentence level.

0 commit comments

Comments
 (0)