Skip to content

Commit 108ebee

Browse files
author
Trevor Bye
committed
adding feedback items
1 parent c04f0c4 commit 108ebee

File tree

1 file changed

+87
-81
lines changed

1 file changed

+87
-81
lines changed

articles/cognitive-services/Speech-Service/includes/how-to/text-to-speech-basics/text-to-speech-basics-csharp.md

Lines changed: 87 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,12 @@ To run the examples in this article, include the following `using` statements at
2626

2727
```csharp
2828
using System;
29-
using Microsoft.CognitiveServices.Speech;
30-
using Microsoft.CognitiveServices.Speech.Audio;
31-
using System.Threading.Tasks;
32-
using System.Net;
3329
using System.IO;
3430
using System.Text;
31+
using System.Threading.Tasks;
3532
using System.Xml.Linq;
33+
using Microsoft.CognitiveServices.Speech;
34+
using Microsoft.CognitiveServices.Speech.Audio;
3635
```
3736

3837
## Create a speech configuration
@@ -54,14 +53,14 @@ In this example, you create a [`SpeechConfig`](https://docs.microsoft.com/dotnet
5453
```csharp
5554
public class Program
5655
{
57-
public static async Task SynthesizeAudioAsync()
56+
static async Task Main()
5857
{
59-
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
58+
await SynthesizeAudioAsync();
6059
}
6160

62-
static void Main(string[] args)
61+
static async Task SynthesizeAudioAsync()
6362
{
64-
SynthesizeAudioAsync().Wait();
63+
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
6564
}
6665
}
6766
```
@@ -70,36 +69,43 @@ public class Program
7069

7170
Next, you create a [`SpeechSynthesizer`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesizer?view=azure-dotnet) object, which executes text-to-speech conversions and outputs to speakers, files, or other output streams. The [`SpeechSynthesizer`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesizer?view=azure-dotnet) accepts as params the [`SpeechConfig`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechconfig?view=azure-dotnet) object created in the previous step, and an [`AudioConfig`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.audio.audioconfig?view=azure-dotnet) object that specifies how output results should be handled.
7271

73-
To start, create an `AudioConfig` to automatically write the output to a `.wav` file, using the `FromWavFileOutput()` function, and wrap it in a `using` block.
72+
To start, create an `AudioConfig` to automatically write the output to a `.wav` file, using the `FromWavFileOutput()` function, and instantiate it with a `using` statement. A `using` statement in this context automatically disposes of unmanaged resources and causes the object to go out of scope after disposal.
7473

7574
```csharp
76-
public static async Task SynthesizeAudioAsync()
75+
static async Task SynthesizeAudioAsync()
7776
{
7877
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
79-
using (var fileOutput = AudioConfig.FromWavFileOutput("path/to/write/file.wav"))
80-
{
81-
}
78+
using var audioConfig = AudioConfig.FromWavFileOutput("path/to/write/file.wav");
8279
}
8380
```
8481

85-
Next, inside the `using` block you just created, create a nested `using` block and initialize the `SpeechSynthesizer`. Pass your `config` object and the `fileOutput` object as params. Then, executing speech synthesis and writing to a file is as simple as running `SpeakTextAsync()` with a string of text.
82+
Next, instantiate a `SpeechSynthesizer` with another `using` statement. Pass your `config` object and the `audioConfig` object as params. Then, executing speech synthesis and writing to a file is as simple as running `SpeakTextAsync()` with a string of text.
8683

8784
```csharp
88-
public static async Task SynthesizeAudioAsync()
85+
static async Task SynthesizeAudioAsync()
8986
{
9087
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
91-
using (var audioConfig = AudioConfig.FromWavFileOutput("path/to/write/file.wav"))
92-
{
93-
using (var synthesizer = new SpeechSynthesizer(config, audioConfig))
94-
{
95-
await synthesizer.SpeakTextAsync("A simple test to write to a file.");
96-
}
97-
}
88+
using var audioConfig = AudioConfig.FromWavFileOutput("path/to/write/file.wav");
89+
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
90+
await synthesizer.SpeakTextAsync("A simple test to write to a file.");
9891
}
9992
```
10093

10194
Run the program, and a synthesized `.wav` file is written to the location you specified. This is a good example of the most basic usage, but next you look at customizing output and handling the output response as an in-memory stream for working with custom scenarios.
10295

96+
### Synthesize to speaker output
97+
98+
In some cases, you may want to directly output synthesized speech directly to a speaker. To do this, simply omit the `AudioConfig` param when creating the `SpeechSynthesizer` in the example above. This outputs to the current active output device.
99+
100+
```csharp
101+
static async Task SynthesizeAudioAsync()
102+
{
103+
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
104+
using var synthesizer = new SpeechSynthesizer(config);
105+
await synthesizer.SpeakTextAsync("A simple test to write to a file.");
106+
}
107+
```
108+
103109
## Get result as an in-memory stream
104110

105111
For many scenarios in speech application development, you likely need the resulting audio data as an in-memory stream rather than directly writing to a file. This will allow you to build custom behavior including:
@@ -108,25 +114,26 @@ For many scenarios in speech application development, you likely need the result
108114
* Integrate the result with other API's or services.
109115
* Modify the audio data, write custom `.wav` headers, etc.
110116

111-
It's simple to make this change from the previous example. First, remove the `AudioConfig` block, as you will manage the output behavior manually from this point onward for increased control. Then pass `null` for the `AudioConfig` in the `SpeechSynthesizer` constructor.
117+
It's simple to make this change from the previous example. First, remove the `AudioConfig` block, as you will manage the output behavior manually from this point onward for increased control. Then pass `null` for the `AudioConfig` in the `SpeechSynthesizer` constructor.
118+
119+
> ![NOTE]
120+
> Passing `null` for the `AudioConfig`, rather than omitting it like in the speaker output example
121+
> above, will not play the audio by default on the current active output device.
112122
113123
This time, you save the result to a [`SpeechSynthesisResult`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisresult?view=azure-dotnet) variable. The `AudioData` property contains a `byte []` of the output data. Simply grab the `byte []` and write it to a new `MemoryStream`. From here you can implement any custom behavior using the resulting output, but in this example you write to a file manually.
114124

115125
```csharp
116-
public static async Task SynthesizeAudioAsync()
126+
static async Task SynthesizeAudioAsync()
117127
{
118128
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
119-
using (var synthesizer = new SpeechSynthesizer(config, null))
120-
{
121-
var result = await synthesizer.SpeakTextAsync("Getting the response as a memory stream.");
122-
MemoryStream stream = new MemoryStream();
123-
stream.Write(result.AudioData);
124-
125-
FileStream fs = File.Create("path/to/write/file.wav");
126-
stream.WriteTo(fs);
127-
fs.Close();
128-
stream.Close();
129-
}
129+
using var synthesizer = new SpeechSynthesizer(config, null);
130+
131+
var result = await synthesizer.SpeakTextAsync("Getting the response as a memory stream.");
132+
using var stream = new MemoryStream();
133+
stream.Write(result.AudioData);
134+
135+
using FileStream fs = File.Create("path/to/write/file.wav");
136+
stream.WriteTo(fs);
130137
}
131138
```
132139

@@ -145,49 +152,51 @@ There are various options for different file types depending on your requirement
145152
First, create a function `WriteWavHeader()` to write the necessary audio metadata to the front of your `MemoryStream`. Since `Raw24Khz16BitMonoPcm` is a raw audio format, you need to write standardized audio file headers so that other software knows information like the number of channels, sample rate, and bit depth when your file is played.
146153

147154
```csharp
148-
private static void WriteWavHeader(MemoryStream stream, bool isFloatingPoint, ushort channelCount, ushort bitDepth, int sampleRate, int totalSampleCount)
149-
{
150-
stream.Position = 0;
151-
stream.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4);
152-
stream.Write(BitConverter.GetBytes(((bitDepth / 8) * totalSampleCount) + 36), 0, 4);
153-
stream.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4);
154-
stream.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4);
155-
stream.Write(BitConverter.GetBytes(16), 0, 4);
156-
157-
// audio format (floating point (3) or PCM (1)). Any other format indicates compression.
158-
stream.Write(BitConverter.GetBytes((ushort)(isFloatingPoint ? 3 : 1)), 0, 2);
159-
160-
stream.Write(BitConverter.GetBytes(channelCount), 0, 2);
161-
stream.Write(BitConverter.GetBytes(sampleRate), 0, 4);
162-
stream.Write(BitConverter.GetBytes(sampleRate * channelCount * (bitDepth / 8)), 0, 4);
163-
stream.Write(BitConverter.GetBytes((ushort)channelCount * (bitDepth / 8)), 0, 2);
164-
stream.Write(BitConverter.GetBytes(bitDepth), 0, 2);
165-
stream.Write(Encoding.ASCII.GetBytes("data"), 0, 4);
166-
stream.Write(BitConverter.GetBytes((bitDepth / 8) * totalSampleCount), 0, 4);
167-
}
155+
static void WriteWavHeader(
156+
MemoryStream stream,
157+
bool isFloatingPoint,
158+
ushort channelCount,
159+
ushort bitDepth,
160+
int sampleRate,
161+
int totalSampleCount)
162+
{
163+
stream.Position = 0;
164+
stream.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4);
165+
stream.Write(BitConverter.GetBytes(((bitDepth / 8) * totalSampleCount) + 36), 0, 4);
166+
stream.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4);
167+
stream.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4);
168+
stream.Write(BitConverter.GetBytes(16), 0, 4);
169+
170+
// audio format (floating point (3) or PCM (1)). Any other format indicates compression.
171+
stream.Write(BitConverter.GetBytes((ushort)(isFloatingPoint ? 3 : 1)), 0, 2);
172+
stream.Write(BitConverter.GetBytes(channelCount), 0, 2);
173+
stream.Write(BitConverter.GetBytes(sampleRate), 0, 4);
174+
stream.Write(BitConverter.GetBytes(sampleRate * channelCount * (bitDepth / 8)), 0, 4);
175+
stream.Write(BitConverter.GetBytes((ushort)channelCount * (bitDepth / 8)), 0, 2);
176+
stream.Write(BitConverter.GetBytes(bitDepth), 0, 2);
177+
stream.Write(Encoding.ASCII.GetBytes("data"), 0, 4);
178+
stream.Write(BitConverter.GetBytes((bitDepth / 8) * totalSampleCount), 0, 4);
179+
}
168180
```
169181

170182
Next, set the `SpeechSynthesisOutputFormat` on the `SpeechConfig` object. Similar to the example in the previous section, you write the `byte []` from the result to a `MemoryStream`, but first you must write the custom `.wav` headers for the chosen file type. Use the function you created above, passing the memory stream by reference. For the other params, the number of **channels** is 1 (mono), the **bit-depth** is 16, the **sample-rate** is 24,000 (24Khz), and the **total samples** is the length of the raw `byte []` from the `SpeechSynthesisResult`.
171183

172184
```csharp
173-
public static async Task SynthesizeAudioAsync()
185+
static async Task SynthesizeAudioAsync()
174186
{
175187
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
176188
config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm);
177189

178-
using (var synthesizer = new SpeechSynthesizer(config, null))
179-
{
180-
var result = await synthesizer.SpeakTextAsync("Customizing audio output.");
181-
MemoryStream stream = new MemoryStream();
182-
// first write the headers to the front of the stream
183-
WriteWavHeader(stream, false, 1, 16, 24000, result.AudioData.Length);
184-
stream.Write(result.AudioData);
185-
186-
FileStream fs = File.Create("path/to/write/file.wav");
187-
stream.WriteTo(fs);
188-
fs.Close();
189-
stream.Close();
190-
}
190+
using var synthesizer = new SpeechSynthesizer(config, null);
191+
var result = await synthesizer.SpeakTextAsync("Customizing audio output.");
192+
using var stream = new MemoryStream();
193+
194+
// first write the headers to the front of the stream
195+
WriteWavHeader(stream, false, 1, 16, 24000, result.AudioData.Length);
196+
stream.Write(result.AudioData);
197+
198+
using FileStream fs = File.Create("path/to/write/file.wav");
199+
stream.WriteTo(fs);
191200
}
192201
```
193202

@@ -198,7 +207,7 @@ Running your program again will write a custom-formatted `.wav` file to the spec
198207
Speech Synthesis Markup Language (SSML) allows you to fine-tune the pitch, pronunciation, speaking rate, volume, and more of the text-to-speech output by submitting your requests from an XML schema. This section shows a few practical usage examples, but for a more detailed guide, see the [SSML how-to article](../../../speech-synthesis-markup.md).
199208

200209
To start using SSML for customization, you make a simple change that switches the voice.
201-
First, create a new XML file for the SSML config in your root project directory, in this example `ssml.xml`. The root element is always `<speak>`, and wrapping the text in a `<voice>` element allows you to change the voice using the `name` param. This example changes the voice to a male English (UK) voice. See the [full list](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#standard-voices) of supported standard voices for additional options.
210+
First, create a new XML file for the SSML config in your root project directory, in this example `ssml.xml`. The root element is always `<speak>`, and wrapping the text in a `<voice>` element allows you to change the voice using the `name` param. This example changes the voice to a male English (UK) voice. Note that this voice is a **standard** voice, which has different pricing and availability than **neural** voices. See the [full list](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#standard-voices) of supported **standard** voices.
202211

203212
```xml
204213
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
@@ -218,19 +227,16 @@ Next, you need to change the speech synthesis request to reference your XML file
218227
public static async Task SynthesizeAudioAsync()
219228
{
220229
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
221-
using (var synthesizer = new SpeechSynthesizer(config, null))
222-
{
223-
string ssml = XDocument.Load(@"./ssml.xml").ToString();
224-
var result = await synthesizer.SpeakSsmlAsync(ssml);
230+
using var synthesizer = new SpeechSynthesizer(config, null);
231+
232+
var ssml = XDocument.Load(@"./ssml.xml").ToString();
233+
var result = await synthesizer.SpeakSsmlAsync(ssml);
225234

226-
MemoryStream stream = new MemoryStream();
227-
stream.Write(result.AudioData);
235+
using var stream = new MemoryStream();
236+
stream.Write(result.AudioData);
228237

229-
FileStream fs = File.Create("path/to/write/file.wav");
230-
stream.WriteTo(fs);
231-
fs.Close();
232-
stream.Close();
233-
}
238+
using FileStream fs = File.Create("path/to/write/file.wav");
239+
stream.WriteTo(fs);
234240
}
235241
```
236242

0 commit comments

Comments
 (0)