Skip to content

Commit 86ff8bf

Browse files
author
Trevor Bye
committed
feedback changes
1 parent 9f0a1e3 commit 86ff8bf

File tree

1 file changed

+18
-60
lines changed

1 file changed

+18
-60
lines changed

articles/cognitive-services/Speech-Service/includes/how-to/text-to-speech-basics/text-to-speech-basics-csharp.md

Lines changed: 18 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ using System;
2929
using System.IO;
3030
using System.Text;
3131
using System.Threading.Tasks;
32-
using System.Xml.Linq;
3332
using Microsoft.CognitiveServices.Speech;
3433
using Microsoft.CognitiveServices.Speech.Audio;
3534
```
@@ -102,7 +101,7 @@ static async Task SynthesizeAudioAsync()
102101
{
103102
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
104103
using var synthesizer = new SpeechSynthesizer(config);
105-
await synthesizer.SpeakTextAsync("A simple test to write to a file.");
104+
await synthesizer.SpeakTextAsync("Synthesizing directly to speaker output.");
106105
}
107106
```
108107

@@ -120,23 +119,21 @@ It's simple to make this change from the previous example. First, remove the `Au
120119
> Passing `null` for the `AudioConfig`, rather than omitting it like in the speaker output example
121120
> above, will not play the audio by default on the current active output device.
122121
123-
This time, you save the result to a [`SpeechSynthesisResult`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisresult?view=azure-dotnet) variable. The `AudioData` property contains a `byte []` of the output data. Simply grab the `byte []` and write it to a new `MemoryStream`. From here you can implement any custom behavior using the resulting output, but in this example you write to a file manually.
122+
This time, you save the result to a [`SpeechSynthesisResult`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisresult?view=azure-dotnet) variable. The `AudioData` property contains a `byte []` of the output data. You can work with this `byte []` manually, or you can use the [`AudioDataStream`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.audiodatastream?view=azure-dotnet) class to manage the in-memory stream. In this example you use the `AudioDataStream.FromResult()` static function to get a stream from the result.
124123

125124
```csharp
126125
static async Task SynthesizeAudioAsync()
127126
{
128127
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
129128
using var synthesizer = new SpeechSynthesizer(config, null);
130129

131-
var result = await synthesizer.SpeakTextAsync("Getting the response as a memory stream.");
132-
using var stream = new MemoryStream();
133-
stream.Write(result.AudioData);
134-
135-
using FileStream fs = File.Create("path/to/write/file.wav");
136-
stream.WriteTo(fs);
130+
var result = await synthesizer.SpeakTextAsync("Getting the response as an in-memory stream.");
131+
using var stream = AudioDataStream.FromResult(result);
137132
}
138133
```
139134

135+
From here you can implement any custom behavior using the resulting `stream` object.
136+
140137
## Customize audio format
141138

142139
The following section shows how to customize audio output attributes including:
@@ -147,60 +144,25 @@ The following section shows how to customize audio output attributes including:
147144

148145
To change the audio format, you use the `SetSpeechSynthesisOutputFormat()` function on the `SpeechConfig` object. This function expects an `enum` of type [`SpeechSynthesisOutputFormat`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet), which you use to select the output format. See the reference docs for a [list of audio formats](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet) that are available.
149146

150-
There are various options for different file types depending on your requirements. In this example, you specify a high-fidelity raw format `Raw24Khz16BitMonoPcm`, which requires writing the `.wav` headers manually. If you choose a format such as `Audio24Khz96KBitRateMonoMp3`, you would not need to write headers.
147+
There are various options for different file types depending on your requirements. Note that by definition, raw formats like `Raw24Khz16BitMonoPcm` do not include audio headers. Use raw formats only when you know your downstream implementation can decode a raw bitstream, or if you plan on manually building headers based on bit-depth, sample-rate, number of channels, etc.
151148

152-
First, create a function `WriteWavHeader()` to write the necessary audio metadata to the front of your `MemoryStream`. Since `Raw24Khz16BitMonoPcm` is a raw audio format, you need to write standardized audio file headers so that other software knows information like the number of channels, sample rate, and bit depth when your file is played.
153-
154-
```csharp
155-
static void WriteWavHeader(
156-
MemoryStream stream,
157-
bool isFloatingPoint,
158-
ushort channelCount,
159-
ushort bitDepth,
160-
int sampleRate,
161-
int totalSampleCount)
162-
{
163-
stream.Position = 0;
164-
stream.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4);
165-
stream.Write(BitConverter.GetBytes(((bitDepth / 8) * totalSampleCount) + 36), 0, 4);
166-
stream.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4);
167-
stream.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4);
168-
stream.Write(BitConverter.GetBytes(16), 0, 4);
169-
170-
// audio format (floating point (3) or PCM (1)). Any other format indicates compression.
171-
stream.Write(BitConverter.GetBytes((ushort)(isFloatingPoint ? 3 : 1)), 0, 2);
172-
stream.Write(BitConverter.GetBytes(channelCount), 0, 2);
173-
stream.Write(BitConverter.GetBytes(sampleRate), 0, 4);
174-
stream.Write(BitConverter.GetBytes(sampleRate * channelCount * (bitDepth / 8)), 0, 4);
175-
stream.Write(BitConverter.GetBytes((ushort)channelCount * (bitDepth / 8)), 0, 2);
176-
stream.Write(BitConverter.GetBytes(bitDepth), 0, 2);
177-
stream.Write(Encoding.ASCII.GetBytes("data"), 0, 4);
178-
stream.Write(BitConverter.GetBytes((bitDepth / 8) * totalSampleCount), 0, 4);
179-
}
180-
```
181-
182-
Next, set the `SpeechSynthesisOutputFormat` on the `SpeechConfig` object. Similar to the example in the previous section, you write the `byte []` from the result to a `MemoryStream`, but first you must write the custom `.wav` headers for the chosen file type. Use the function you created above, passing the memory stream by reference. For the other params, the number of **channels** is 1 (mono), the **bit-depth** is 16, the **sample-rate** is 24,000 (24Khz), and the **total samples** is the length of the raw `byte []` from the `SpeechSynthesisResult`.
149+
In this example, you specify a high-fidelity RIFF format `Riff24Khz16BitMonoPcm` by setting the `SpeechSynthesisOutputFormat` on the `SpeechConfig` object. Similar to the example in the previous section, you use [`AudioDataStream`](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.audiodatastream?view=azure-dotnet) to get an in-memory stream of the result, and then write it to a file.
183150

184151
```csharp
185152
static async Task SynthesizeAudioAsync()
186153
{
187154
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
188-
config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm);
155+
config.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
189156

190157
using var synthesizer = new SpeechSynthesizer(config, null);
191-
var result = await synthesizer.SpeakTextAsync("Customizing audio output.");
192-
using var stream = new MemoryStream();
193-
194-
// first write the headers to the front of the stream
195-
WriteWavHeader(stream, false, 1, 16, 24000, result.AudioData.Length);
196-
stream.Write(result.AudioData);
158+
var result = await synthesizer.SpeakTextAsync("Customizing audio output format.");
197159

198-
using FileStream fs = File.Create("path/to/write/file.wav");
199-
stream.WriteTo(fs);
160+
using var stream = AudioDataStream.FromResult(result);
161+
await stream.SaveToWaveFileAsync("path/to/write/file.wav");
200162
}
201163
```
202164

203-
Running your program again will write a custom-formatted `.wav` file to the specified path.
165+
Running your program again will write a `.wav` file to the specified path.
204166

205167
## Use SSML to customize speech characteristics
206168

@@ -217,7 +179,7 @@ First, create a new XML file for the SSML config in your root project directory,
217179
</speak>
218180
```
219181

220-
Next, you need to change the speech synthesis request to reference your XML file. The request is mostly the same, but instead of using the `SpeakTextAsync()` function, you use `SpeakSsmlAsync()`. This function expects an XML string, so you first load your SSML config as a string using `XDocument.Load()`. From here, the result object is exactly the same as previous examples.
182+
Next, you need to change the speech synthesis request to reference your XML file. The request is mostly the same, but instead of using the `SpeakTextAsync()` function, you use `SpeakSsmlAsync()`. This function expects an XML string, so you first load your SSML config as a string using `File.ReadAllText()`. From here, the result object is exactly the same as previous examples.
221183

222184
> [!NOTE]
223185
> If you're using Visual Studio, your build config likely will not find your XML file by default. To fix this, right click the XML file and
@@ -229,14 +191,11 @@ public static async Task SynthesizeAudioAsync()
229191
var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
230192
using var synthesizer = new SpeechSynthesizer(config, null);
231193

232-
var ssml = XDocument.Load(@"./ssml.xml").ToString();
194+
var ssml = File.ReadAllText("./ssml.xml");
233195
var result = await synthesizer.SpeakSsmlAsync(ssml);
234196

235-
using var stream = new MemoryStream();
236-
stream.Write(result.AudioData);
237-
238-
using FileStream fs = File.Create("path/to/write/file.wav");
239-
stream.WriteTo(fs);
197+
using var stream = AudioDataStream.FromResult(result);
198+
await stream.SaveToWaveFileAsync("path/to/write/file.wav");
240199
}
241200
```
242201

@@ -269,5 +228,4 @@ To switch to a neural voice, change the `name` to one of the [neural voice optio
269228
</mstts:express-as>
270229
</voice>
271230
</speak>
272-
```
273-
231+
```

0 commit comments

Comments
 (0)