Skip to content

Commit 6f9f054

Browse files
Update speech-synthesis-markup-voice.md
1 parent 1760cbc commit 6f9f054

File tree

1 file changed

+15
-27
lines changed

1 file changed

+15
-27
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup-voice.md

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,14 @@ At least one `voice` element must be specified within each SSML [speak](speech-s
2424

2525
You can include multiple `voice` elements in a single SSML document. Each `voice` element can specify a different voice. You can also use the same voice multiple times with different settings, such as when you [change the silence duration](speech-synthesis-markup-structure.md#add-silence) between sentences.
2626

27+
Regarding the `effect` attribute of the `voice` element, it is an audio effect processor. This attribute is used to enhance the auditory quality of the synthesized speech output from various device. In a practical environment, the audience's auditory experience may be degraded due to the distortion of playback from various devices in different scenarios. For example, the synthesized speech from car speaker may sound dull and muffled due to environmental factors such as speaker response, room reverberation, and background noise. The driver usually has to turn up the volume to hear more clearly. In such a case, the `effect` processor can make the sound clearer by compensating the distortion of playback without any manual operation.
28+
2729
Usage of the `voice` element's attributes are described in the following table.
2830

2931
| Attribute | Description | Required or optional |
3032
| ---------- | ---------- | ---------- |
3133
| `name` | The voice used for text-to-speech output. For a complete list of supported prebuilt voices, see [Language support](language-support.md?tabs=stt-tts).| Required|
34+
| `effect` |A voice-specific effect processor. You can choose a specific value according to the corresponding scenarios. The following values are supported:<br/><ul><li>`eq_car` – Optimize the auditory experience when providing high-fidelity speech in the car scenarios, such as small cars, buses, and other enclosed small/medium vehicles.</li><li>`eq_telecomhp8k` – Optimize the auditory experience in telecom or telephone scenarios. This feature is only designed for narrowband speech (sampling rate = 8kHz). If the sample rate of the output speech is not 8kHz, the auditory quality of the output speech isn't guaranteed even with this attribute. We recommend that you convert the sample rate of the output speech to 8kHz to get a better result with this attribute in telecom scenarios. </li></ul><br/>If the value is missing or invalid, the `effect` attribute will be ignored and the service will use the default neutral speech.| Optional |
3235

3336
### Voice examples
3437

@@ -77,6 +80,18 @@ This example uses a custom voice named "my-custom-voice".
7780
</speak>
7881
```
7982

83+
#### Audio effect example
84+
85+
You use the `effect` attribute to optimize the auditory experience for different voices. The following SSML example uses the `effect` attribute with the configuration in car scenarios.
86+
87+
```xml
88+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
89+
<voice name="en-US-JennyNeural" effect="eq_car">
90+
This is the text that is spoken.
91+
</voice>
92+
</speak>
93+
```
94+
8095
## Speaking styles and roles
8196

8297
By default, neural voices have a neutral speaking style. You can adjust the speaking style, style degree, and role at the sentence level.
@@ -441,33 +456,6 @@ The supported values for attributes of the `mstts:backgroundaudio` element were
441456
</speak>
442457
```
443458

444-
## Audio effect
445-
446-
The `effect` element is an effect processor that is used to enhance the auditory quality of the synthesized speech output from various device. In a practical environment, the audience's auditory experience may be degraded due to the distortion of playback from various devices in different scenarios. For example, the synthesized speech from car speaker may sound dull and muffled due to environmental factors such as speaker response, room reverberation, and background noise. The driver usually has to turn up the volume to hear more clearly. In such a case, the `effect` processor can make the sound clearer by compensating the distortion of playback without any manual operation.
447-
448-
You can configure the `effect` element's attributes within the `voice` element to optimize the auditory experience of synthesized speech. For example, if you are in a car scenario, you can use the value `eq_car` to make the synthesized speech from the car speaker clearer.
449-
450-
451-
Usage of the `effect` element's attributes are described in the following table.
452-
453-
| Attribute | Description | Required or optional |
454-
| ---------- | ---------- | ---------- |
455-
| `effect` |A voice-specific effect processor. You can choose a specific value according to the corresponding scenarios. The following values are supported:<br/><ul><li>`eq_car` – Optimize the auditory experience when providing high-fidelity speech in the car scenarios, such as small cars, buses, and other enclosed small/medium vehicles.</li><li>`eq_telecomhp8k` – Optimize the auditory experience in telecom or telephone scenarios. This feature is only designed for narrowband speech (sampling rate = 8kHz). If the sample rate of the output speech is not 8kHz, the auditory quality of the output speech isn't guaranteed even with this attribute. We recommend that you convert the sample rate of the output speech to 8kHz to get a better result with this attribute in telecom scenarios. </li></ul><br/>If the value is missing or invalid, the `effect` element will be ignored and the service will use the default neutral speech.| Required |
456-
457-
### Audio effect examples
458-
459-
The supported values for attributes of the `effect` element were [described previously](#audio-effect).
460-
461-
You use the `effect` element within the `voice` element to optimize the auditory experience for different voices. The following SSML example uses the `effect` element with the configuration in car scenarios.
462-
463-
```xml
464-
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
465-
<voice name="en-US-JennyNeural" effect="eq_car">
466-
This is the text that is spoken.
467-
</voice>
468-
</speak>
469-
```
470-
471459
## Next steps
472460

473461
- [SSML overview](speech-synthesis-markup.md)

0 commit comments

Comments
 (0)