You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-speech-synthesis.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
---
2
2
title: "How to synthesize speech from text - Speech service"
3
3
titleSuffix: Azure AI services
4
-
description: Learn how to convert text to speech. Learn about object construction and design patterns, supported audio output formats, and custom configuration options for speech synthesis.
4
+
description: Learn how to convert text to speech, including object construction and design patterns, supported audio output formats, and custom configuration options.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/how-to/speech-synthesis/cli.md
+14-13Lines changed: 14 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
author: eric-urban
3
3
ms.service: cognitive-services
4
4
ms.topic: include
5
-
ms.date: 08/11/2020
5
+
ms.date: 08/30/2023
6
6
ms.author: eur
7
7
---
8
8
@@ -18,27 +18,28 @@ ms.author: eur
18
18
19
19
## Synthesize speech to a speaker
20
20
21
-
Now you're ready to run the Speech CLI to synthesize speech from text. From the command line, change to the directory that contains the Speech CLI binary file. Then run the following command:
21
+
Now you're ready to run the Speech CLI to synthesize speech from text.
22
22
23
-
```bash
24
-
spx synthesize --text "I'm excited to try text to speech"
25
-
```
23
+
- In a console window, change to the directory that contains the Speech CLI binary file. Then run the following command:
26
24
27
-
The Speech CLI will produce natural language in English through the computer speaker.
25
+
```console
26
+
spx synthesize --text "I'm excited to try text to speech"
27
+
```
28
+
29
+
The Speech CLI produces natural language in English through the computer speaker.
28
30
29
31
## Synthesize speech to a file
30
32
31
-
Run the following command to change the output from your speaker to a .wav file:
33
+
-Run the following command to change the output from your speaker to a *.wav* file:
32
34
33
-
```bash
34
-
spx synthesize --text "I'm excited to try text to speech" --audio output greetings.wav
35
-
```
35
+
```console
36
+
spx synthesize --text "I'm excited to try text to speech" --audio output greetings.wav
37
+
```
36
38
37
-
The Speech CLI will produce natural language in English in the `greetings.wav` audio file.
39
+
The Speech CLI produces natural language in English to the *greetings.wav* audio file.
38
40
39
41
## Run and use a container
40
42
41
43
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
42
44
43
-
For more information about containers, see the [speech containers](../../../speech-container-howto.md#host-urls) how-to guide.
44
-
45
+
For more information about containers, see [Install and run Speech containers with Docker](../../../speech-container-howto.md).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/how-to/speech-synthesis/events.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,20 +3,20 @@ author: eric-urban
3
3
ms.service: cognitive-services
4
4
ms.subservice: speech-service
5
5
ms.topic: include
6
-
ms.date: 11/14/2022
6
+
ms.date: 08/30/2023
7
7
ms.author: eur
8
8
---
9
9
10
10
11
11
| Event | Description | Use case |
12
-
|--- |--- |--- |
13
-
|`BookmarkReached`|Signals that a bookmark was reached. To trigger a bookmark reached event, a `bookmark` element is required in the [SSML](../../../speech-synthesis-markup-structure.md#bookmark-element). This event reports the output audio's elapsed time between the beginning of synthesis and the `bookmark` element. The event's `Text` property is the string value that you set in the bookmark's `mark` attribute. The `bookmark` elements won't be spoken.|You can use the `bookmark` element to insert custom markers in SSML to get the offset of each marker in the audio stream. The `bookmark` element can be used to reference a specific location in the text or tag sequence.|
14
-
|`SynthesisCanceled`|Signals that the speech synthesis was canceled.|You can confirm when synthesis has been canceled.|
15
-
|`SynthesisCompleted`|Signals that speech synthesis has completed.|You can confirm when synthesis has completed.|
16
-
|`SynthesisStarted`|Signals that speech synthesis has started.|You can confirm when synthesis has started.|
17
-
|`Synthesizing`|Signals that speech synthesis is ongoing. This event fires each time the SDK receives an audio chunk from the Speech service.|You can confirm when synthesis is in progress.|
18
-
|`VisemeReceived`|Signals that a viseme event was received.|[Visemes](../../../how-to-speech-synthesis-viseme.md) are often used to represent the key poses in observed speech. Key poses include the position of the lips, jaw, and tongue in producing a particular phoneme. You can use visemes to animate the face of a character as speech audio plays.|
19
-
|`WordBoundary`|Signals that a word boundary was received. This event is raised at the beginning of each new spoken word, punctuation, and sentence. The event reports the current word's time offset (in ticks) from the beginning of the output audio. This event also reports the character position in the input text (or [SSML](../../../speech-synthesis-markup.md)) immediately before the word that's about to be spoken.|This event is commonly used to get relative positions of the text and corresponding audio. You might want to know about a new word, and then take action based on the timing. For example, you can get information that can help you decide when and for how long to highlight words as they're spoken.|
12
+
|:--- |:--- |:--- |
13
+
|`BookmarkReached`|Signals that a bookmark was reached. To trigger a bookmark reached event, a `bookmark` element is required in the [SSML](../../../speech-synthesis-markup-structure.md#bookmark-element). This event reports the output audio's elapsed time between the beginning of synthesis and the `bookmark` element. The event's `Text` property is the string value that you set in the bookmark's `mark` attribute. The `bookmark` elements aren't spoken.|You can use the `bookmark` element to insert custom markers in SSML to get the offset of each marker in the audio stream. The `bookmark` element can be used to reference a specific location in the text or tag sequence.|
14
+
|`SynthesisCanceled`|Signals that the speech synthesis was canceled.|You can confirm when synthesis has been canceled.|
15
+
|`SynthesisCompleted`|Signals that speech synthesis has completed.|You can confirm when synthesis has completed.|
16
+
|`SynthesisStarted`|Signals that speech synthesis has started.|You can confirm when synthesis has started.|
17
+
|`Synthesizing`|Signals that speech synthesis is ongoing. This event fires each time the SDK receives an audio chunk from the Speech service.|You can confirm when synthesis is in progress.|
18
+
|`VisemeReceived`|Signals that a viseme event was received.|[Visemes](../../../how-to-speech-synthesis-viseme.md) are often used to represent the key poses in observed speech. Key poses include the position of the lips, jaw, and tongue in producing a particular phoneme. You can use visemes to animate the face of a character as speech audio plays.|
19
+
|`WordBoundary`|Signals that a word boundary was received. This event is raised at the beginning of each new spoken word, punctuation, and sentence. The event reports the current word's time offset, in ticks, from the beginning of the output audio. This event also reports the character position in the input text or [SSML](../../../speech-synthesis-markup.md) immediately before the word that's about to be spoken.|This event is commonly used to get relative positions of the text and corresponding audio. You might want to know about a new word, and then take action based on the timing. For example, you can get information that can help you decide when and for how long to highlight words as they're spoken.|
20
20
21
21
> [!NOTE]
22
-
> Events are raised as the output audio data becomes available, which will be faster than playback to an output device. The caller must appropriately synchronize streaming and real-time.
22
+
> Events are raised as the output audio data becomes available, which is faster than playback to an output device. The caller must appropriately synchronize streaming and real-time.
0 commit comments