Skip to content

Commit b7ccb8a

Browse files
author
Trevor Bye
committed
adding java and toc entry
1 parent dda5ab6 commit b7ccb8a

File tree

3 files changed

+240
-1
lines changed

3 files changed

+240
-1
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
---
2+
author: trevorbye
3+
ms.service: cognitive-services
4+
ms.topic: include
5+
ms.date: 03/25/2020
6+
ms.author: trbye
7+
---
8+
9+
## Prerequisites
10+
11+
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, [try the Speech service for free](../../../get-started.md).
12+
13+
## Install the Speech SDK
14+
15+
Before you can do anything, you'll need to install the Speech SDK. Depending on your platform, use the following instructions:
16+
17+
* <a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/quickstarts/setup-platform?tabs=jre&pivots=programming-language-java" target="_blank">Java Runtime <span class="docon docon-navigate-external x-hidden-focus"></span></a>
18+
* <a href="https://docs.microsoft.com/azure/cognitive-services/speech-service/quickstarts/setup-platform?tabs=android&pivots=programming-language-java" target="_blank">Android <span class="docon docon-navigate-external x-hidden-focus"></span></a>
19+
20+
## Import dependencies
21+
22+
To run the examples in this article, include the following import statements at the top of your script.
23+
24+
```java
25+
import com.microsoft.cognitiveservices.speech.AudioDataStream;
26+
import com.microsoft.cognitiveservices.speech.SpeechConfig;
27+
import com.microsoft.cognitiveservices.speech.SpeechSynthesizer;
28+
import com.microsoft.cognitiveservices.speech.SpeechSynthesisOutputFormat;
29+
import com.microsoft.cognitiveservices.speech.SpeechSynthesisResult;
30+
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
31+
32+
import java.io.*;
33+
import java.util.Scanner;
34+
```
35+
36+
## Create a speech configuration
37+
38+
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable). This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
39+
40+
> [!NOTE]
41+
> Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.
42+
43+
There are a few ways that you can initialize a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable):
44+
45+
* With a subscription: pass in a key and the associated region.
46+
* With an endpoint: pass in a Speech service endpoint. A key or authorization token is optional.
47+
* With a host: pass in a host address. A key or authorization token is optional.
48+
* With an authorization token: pass in an authorization token and the associated region.
49+
50+
In this example, you create a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable) using a subscription key and region. See the [region support](https://docs.microsoft.com/azure/cognitive-services/speech-service/regions#speech-sdk) page to find your region identifier. You also create some basic boilerplate code to use for the rest of this article, which you modify for different customizations.
51+
52+
```java
53+
public class Program
54+
{
55+
public static void main(String[] args) {
56+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
57+
}
58+
}
59+
```
60+
61+
## Synthesize speech to a file
62+
63+
Next, you create a [`SpeechSynthesizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechsynthesizer?view=azure-java-stable) object, which executes text-to-speech conversions and outputs to speakers, files, or other output streams. The [`SpeechSynthesizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechsynthesizer?view=azure-java-stable) accepts as params the [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable) object created in the previous step, and an [`AudioConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.audio.audioconfig?view=azure-java-stable) object that specifies how output results should be handled.
64+
65+
To start, create an `AudioConfig` to automatically write the output to a `.wav` file using the `fromWavFileOutput()` static function.
66+
67+
```java
68+
public static void main(String[] args) {
69+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
70+
AudioConfig audioConfig = AudioConfig.fromWavFileOutput("path/to/write/file.wav");
71+
}
72+
```
73+
74+
Next, instantiate a `SpeechSynthesizer` passing your `speechConfig` object and the `audioConfig` object as params. Then, executing speech synthesis and writing to a file is as simple as running `SpeakText()` with a string of text.
75+
76+
```java
77+
public static void main(String[] args) {
78+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
79+
AudioConfig audioConfig = AudioConfig.fromWavFileOutput("path/to/write/file.wav");
80+
81+
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);
82+
synthesizer.SpeakText("A simple test to write to a file.");
83+
}
84+
```
85+
86+
Run the program, and a synthesized `.wav` file is written to the location you specified. This is a good example of the most basic usage, but next you look at customizing output and handling the output response as an in-memory stream for working with custom scenarios.
87+
88+
### Synthesize to speaker output
89+
90+
In some cases, you may want to directly output synthesized speech directly to a speaker. To do this, instantiate the `AudioConfig` using the `fromDefaultSpeakerOutput()` static function. This outputs to the current active output device.
91+
92+
```java
93+
public static void main(String[] args) {
94+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
95+
AudioConfig audioConfig = AudioConfig.fromDefaultSpeakerOutput();
96+
97+
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);
98+
synthesizer.SpeakText("Synthesizing directly to speaker output.");
99+
}
100+
```
101+
102+
## Get result as an in-memory stream
103+
104+
For many scenarios in speech application development, you likely need the resulting audio data as an in-memory stream rather than directly writing to a file. This will allow you to build custom behavior including:
105+
106+
* Abstract the resulting byte array as a seek-able stream for custom downstream services.
107+
* Integrate the result with other API's or services.
108+
* Modify the audio data, write custom `.wav` headers, etc.
109+
110+
It's simple to make this change from the previous example. First, remove the `AudioConfig` block, as you will manage the output behavior manually from this point onward for increased control. Then pass `null` for the `AudioConfig` in the `SpeechSynthesizer` constructor.
111+
112+
> ![NOTE]
113+
> Passing `null` for the `AudioConfig`, rather than omitting it like in the speaker output example
114+
> above, will not play the audio by default on the current active output device.
115+
116+
This time, you save the result to a [`SpeechSynthesisResult`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechsynthesisresult?view=azure-java-stable) variable. The `SpeechSynthesisResult.getAudioData()` function returns a `byte []` of the output data. You can work with this `byte []` manually, or you can use the [`AudioDataStream`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.audiodatastream?view=azure-java-stable) class to manage the in-memory stream. In this example you use the `AudioDataStream.fromResult()` static function to get a stream from the result.
117+
118+
```java
119+
public static void main(String[] args) {
120+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
121+
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, null);
122+
123+
SpeechSynthesisResult result = synthesizer.SpeakText("Getting the response as an in-memory stream.");
124+
AudioDataStream stream = AudioDataStream.fromResult(result);
125+
System.out.print(stream.getStatus());
126+
}
127+
```
128+
129+
From here you can implement any custom behavior using the resulting `stream` object.
130+
131+
## Customize audio format
132+
133+
The following section shows how to customize audio output attributes including:
134+
135+
* Audio file type
136+
* Sample-rate
137+
* Bit-depth
138+
139+
To change the audio format, you use the `setSpeechSynthesisOutputFormat()` function on the `SpeechConfig` object. This function expects an `enum` of type [`SpeechSynthesisOutputFormat`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-java-stable), which you use to select the output format. See the reference docs for a [list of audio formats](https://docs.microsoft.com/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisoutputformat?view=azure-dotnet) that are available.
140+
141+
There are various options for different file types depending on your requirements. Note that by definition, raw formats like `Raw24Khz16BitMonoPcm` do not include audio headers. Use raw formats only when you know your downstream implementation can decode a raw bitstream, or if you plan on manually building headers based on bit-depth, sample-rate, number of channels, etc.
142+
143+
In this example, you specify a high-fidelity RIFF format `Riff24Khz16BitMonoPcm` by setting the `SpeechSynthesisOutputFormat` on the `SpeechConfig` object. Similar to the example in the previous section, you use [`AudioDataStream`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.audiodatastream?view=azure-java-stable) to get an in-memory stream of the result, and then write it to a file.
144+
145+
```java
146+
public static void main(String[] args) {
147+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
148+
149+
// set the output format
150+
speechConfig.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
151+
152+
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, null);
153+
SpeechSynthesisResult result = synthesizer.SpeakText("Customizing audio output format.");
154+
AudioDataStream stream = AudioDataStream.fromResult(result);
155+
stream.saveToWavFile("path/to/write/file.wav");
156+
}
157+
```
158+
159+
Running your program again will write a `.wav` file to the specified path.
160+
161+
## Use SSML to customize speech characteristics
162+
163+
Speech Synthesis Markup Language (SSML) allows you to fine-tune the pitch, pronunciation, speaking rate, volume, and more of the text-to-speech output by submitting your requests from an XML schema. This section shows a few practical usage examples, but for a more detailed guide, see the [SSML how-to article](../../../speech-synthesis-markup.md).
164+
165+
To start using SSML for customization, you make a simple change that switches the voice.
166+
First, create a new XML file for the SSML config in your root project directory, in this example `ssml.xml`. The root element is always `<speak>`, and wrapping the text in a `<voice>` element allows you to change the voice using the `name` param. This example changes the voice to a male English (UK) voice. Note that this voice is a **standard** voice, which has different pricing and availability than **neural** voices. See the [full list](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#standard-voices) of supported **standard** voices.
167+
168+
```xml
169+
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
170+
<voice name="en-GB-George-Apollo">
171+
When you're on the motorway, it's a good idea to use a sat-nav.
172+
</voice>
173+
</speak>
174+
```
175+
176+
Next, you need to change the speech synthesis request to reference your XML file. The request is mostly the same, but instead of using the `SpeakText()` function, you use `SpeakSsml()`. This function expects an XML string, so first you create a function to load an XML file and return it as a string.
177+
178+
```java
179+
private static String xmlToString(String filePath) {
180+
File file = new File(filePath);
181+
StringBuilder fileContents = new StringBuilder((int)file.length());
182+
183+
try (Scanner scanner = new Scanner(file)) {
184+
while(scanner.hasNextLine()) {
185+
fileContents.append(scanner.nextLine() + System.lineSeparator());
186+
}
187+
return fileContents.toString().trim();
188+
} catch (FileNotFoundException ex) {
189+
return "File not found.";
190+
}
191+
}
192+
```
193+
194+
From here, the result object is exactly the same as previous examples.
195+
196+
```java
197+
public static void main(String[] args) {
198+
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
199+
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, null);
200+
201+
String ssml = xmlToString("ssml.xml");
202+
SpeechSynthesisResult result = synthesizer.SpeakSsml(ssml);
203+
AudioDataStream stream = AudioDataStream.fromResult(result);
204+
stream.saveToWavFile("path/to/write/file.wav");
205+
}
206+
```
207+
208+
The output works, but there a few simple additional changes you can make to help it sound more natural. The overall speaking speed is a little too fast, so we'll add a `<prosody>` tag and reduce the speed to **90%** of the default rate. Additionally, the pause after the comma in the sentence is a little too short and unnatural sounding. To fix this issue, add a `<break>` tag to delay the speech, and set the time param to **200ms**. Re-run the synthesis to see how these customizations affected the output.
209+
210+
```xml
211+
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
212+
<voice name="en-GB-George-Apollo">
213+
<prosody rate="0.9">
214+
When you're on the motorway,<break time="200ms"/> it's a good idea to use a sat-nav.
215+
</prosody>
216+
</voice>
217+
</speak>
218+
```
219+
220+
### Neural voices
221+
222+
Neural voices are speech synthesis algorithms powered by deep neural networks. When using a neural voice, synthesized speech is nearly indistinguishable from the human recordings. With the human-like natural prosody and clear articulation of words, neural voices significantly reduce listening fatigue when users interact with AI systems.
223+
224+
To switch to a neural voice, change the `name` to one of the [neural voice options](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#neural-voices). Then, add an XML namespace for `mstts`, and wrap your text in the `<mstts:express-as>` tag. Use the `style` param to customize the speaking style. This example uses `cheerful`, but try setting it to `customerservice` or `chat` to see the difference in speaking style.
225+
226+
> [!IMPORTANT]
227+
> Neural voices are **only** supported for Speech resources created in *East US*, *South East Asia*, and *West Europe* regions.
228+
229+
```xml
230+
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
231+
<voice name="en-US-AriaNeural">
232+
<mstts:express-as style="cheerful">
233+
This is awesome!
234+
</mstts:express-as>
235+
</voice>
236+
</speak>
237+
```

articles/cognitive-services/Speech-Service/text-to-speech-basics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ In this article, you learn common design patterns for doing text-to-speech synth
3535
::: zone-end
3636

3737
::: zone pivot="programming-language-java"
38-
[!INCLUDE [Java Basics include]()]
38+
[!INCLUDE [Java Basics include](includes/how-to/text-to-speech-basics/text-to-speech-basics-java.md)]
3939
::: zone-end
4040

4141
::: zone pivot="programming-language-python"

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,8 @@
113113
items:
114114
- name: What is text-to-speech?
115115
href: text-to-speech.md
116+
- name: Text-to-speech basics
117+
href: text-to-speech-basics.md
116118
- name: Quickstart
117119
items:
118120
- name: Synthesize speech to a speaker

0 commit comments

Comments
 (0)