Skip to content

Commit 771723e

Browse files
committed
Added Java bits to basics
1 parent f62dd3e commit 771723e

File tree

2 files changed

+245
-6
lines changed

2 files changed

+245
-6
lines changed

articles/cognitive-services/Speech-Service/includes/how-to/speech-to-text-basics/speech-to-text-basics-cpp.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,11 @@ This article assumes that you have an Azure account and Speech service subscript
1212

1313
## Install the Speech SDK
1414

15-
Before you can do anything, you'll need to install the Speech SDK. Run `Install-Package Microsoft.CognitiveServices.Speech` in the NuGet console from Visual Studio.
15+
Before you can do anything, you'll need to install the Speech SDK. Depending on your platform, use the following instructions:
16+
17+
* <a href="../../../quickstarts/setup-platform.md?tabs=linux&pivots=programming-language-cpp" target="_blank">Linux <span class="docon docon-navigate-external x-hidden-focus"></span></a>
18+
* <a href="../../../quickstarts/setup-platform.md?tabs=macos&pivots=programming-language-cpp" target="_blank">macOS <span class="docon docon-navigate-external x-hidden-focus"></span></a>
19+
* <a href="../../../quickstarts/setup-platform.md?tabs=windows&pivots=programming-language-cpp" target="_blank">Windows <span class="docon docon-navigate-external x-hidden-focus"></span></a>
1620

1721
## Create a speech configuration
1822

@@ -65,7 +69,7 @@ auto recognizer = SpeechRecognizer::FromConfig(config, audioConfig);
6569
If you don't want to use a microphone, and want to provide an audio file, you'll still need to provide an `audioConfig`. However, when you create an [`AudioConfig`](https://docs.microsoft.com/cpp/cognitive-services/speech/audio-audioconfig), instead of calling `FromDefaultMicrophoneInput`, you'll call `FromWavFileOutput` and pass the `filename` parameter.
6670

6771
```cpp
68-
auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
72+
auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
6973
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);
7074
```
7175

@@ -74,7 +78,7 @@ auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);
7478
The [Recognizer class](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer) for the Speech SDK for C++ exposes a few methods that you can use for speech recognition.
7579

7680
* Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. This will recognize a single utterance. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
77-
* Continuous recognition (async) - Asynchronously initiates continuous recognition operation. User has to connect to EventSignal to receive recognition results. To stop asynchronous continuous recognition, call [StopContinuousRecognitionAsync](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync).
81+
* Continuous recognition (async) - Asynchronously initiates continuous recognition operation. The user has to connect to handle event to receive recognition results. To stop asynchronous continuous recognition, call [`StopContinuousRecognitionAsync`](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync).
7882

7983
> [!NOTE]
8084
> Learn more about how to [choose a speech recognition mode](../../../how-to-choose-recognition-mode.md).
@@ -126,7 +130,7 @@ Continuous recognition is a bit more involved than single-shot recognition. It r
126130
Let's start by defining the input and initializing a [`SpeechRecognizer`](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer):
127131

128132
```cpp
129-
auto audioInput = AudioConfig::FromWavFileInput("whatstheweatherlike.wav");
133+
auto audioInput = AudioConfig::FromWavFileInput("YourAudioFile.wav");
130134
auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);
131135
```
132136

@@ -182,7 +186,7 @@ recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
182186
});
183187
```
184188
185-
With everything set up, we can call [StopContinuousRecognitionAsync](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer#startcontinuousrecognitionasync).
189+
With everything set up, we can call [`StopContinuousRecognitionAsync`](https://docs.microsoft.com/cpp/cognitive-services/speech/speechrecognizer#startcontinuousrecognitionasync).
186190
187191
```cpp
188192
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.

articles/cognitive-services/Speech-Service/includes/how-to/speech-to-text-basics/speech-to-text-basics-java.md

Lines changed: 236 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,242 @@
22
author: IEvangelist
33
ms.service: cognitive-services
44
ms.topic: include
5-
ms.date: 03/02/2020
5+
ms.date: 03/06/2020
66
ms.author: dapine
77
---
88

9+
## Prerequisites
10+
11+
This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, [try the Speech service for free](../../../get-started.md).
12+
13+
## Install the Speech SDK
14+
15+
Before you can do anything, you'll need to install the Speech SDK. Depending on your platform, use the following instructions:
16+
17+
* <a href="../../../quickstarts/setup-platform.md?tabs=jre&pivots=programming-language-java" target="_blank">Java Runtime <span class="docon docon-navigate-external x-hidden-focus"></span></a>
18+
* <a href="../../../quickstarts/setup-platform.md?tabs=android&pivots=programming-language-java" target="_blank">Android <span class="docon docon-navigate-external x-hidden-focus"></span></a>
19+
20+
## Create a speech configuration
21+
22+
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable). This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
23+
24+
> [!NOTE]
25+
> Regardless of whether you're performing speech recognition, speech synthesis, translation, or intent recognition, you'll always create a configuration.
26+
27+
There are a few ways that you can initialize a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable):
28+
29+
* With a subscription: pass in a key and the associated region.
30+
* With an endpoint: pass in a Speech service endpoint. A key or authorization token is optional.
31+
* With a host: pass in a host address. A key or authorization token is optional.
32+
* With an authorization token: pass in an authorization token and the associated region.
33+
34+
Let's take a look at how a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable) is created using a key and region.
35+
36+
```java
37+
SpeechConfig config = SpeechConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
38+
```
39+
40+
## Initialize a recognizer
41+
42+
After you've created a [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable), the next step is to initialize a [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable). When you initialize a [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable), you'll need to pass it your `speech_config`. This ensures that the credentials that the service requires to validate your request are provided.
43+
44+
If you're recognizing speech using your device's default microphone, here's what the [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable) should look like:
45+
46+
```java
47+
SpeechRecognizer recognizer = new SpeechRecognizer(config);
48+
```
49+
50+
If you want to specify the audio input device, then you'll need to create an [`AudioConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.audio.audioconfig?view=azure-java-stable) and provide the `audioConfig` parameter when initializing your [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable).
51+
52+
> [!TIP]
53+
> [Learn how to get the device ID for your audio input device](../../../how-to-select-audio-input-devices.md).
54+
55+
First, add the following `import` statements.
56+
57+
```java
58+
import java.util.concurrent.Future;
59+
import com.microsoft.cognitiveservices.speech.*;
60+
```
61+
62+
Next, you'll be able to reference the `AudioConfig` object as follows:
63+
64+
```java
65+
AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();
66+
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);
67+
```
68+
69+
If you don't want to use a microphone, and want to provide an audio file, you'll still need to provide an `audioConfig`. However, when you create an [`AudioConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.audio.audioconfig?view=azure-java-stable), instead of calling `fromDefaultMicrophoneInput`, you'll call `fromWavFileOutput` and pass the `filename` parameter.
70+
71+
```java
72+
AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
73+
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);
74+
```
75+
76+
## Recognize speech
77+
78+
The [Recognizer class](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable) for the Speech SDK for Java exposes a few methods that you can use for speech recognition.
79+
80+
* Single-shot recognition (async) - Performs recognition in a non-blocking (asynchronous) mode. This will recognize a single utterance. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
81+
* Continuous recognition (async) - Asynchronously initiates continuous recognition operation. The user has to handle event to receive recognition results. To stop asynchronous continuous recognition, call [stopContinuousRecognitionAsync](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable#stopcontinuousrecognitionasync).
82+
83+
> [!NOTE]
84+
> Learn more about how to [choose a speech recognition mode](../../../how-to-choose-recognition-mode.md).
85+
86+
### Single-shot recognition
87+
88+
Here's an example of asynchronous single-shot recognition using [`recognizeOnceAsync`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer.recognizeonceasync?view=azure-java-stable):
89+
90+
```java
91+
Future<SpeechRecognitionResult> task = recognizer.recognizeOnceAsync();
92+
SpeechRecognitionResult result = task.get();
93+
```
94+
95+
You'll need to write some code to handle the result. This sample does a few things with the [`result.getReason()`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.resultreason?view=azure-java-stable):
96+
97+
* Prints the recognition result: `ResultReason.RecognizedSpeech`
98+
* If there is no recognition match, inform the user: `ResultReason.NoMatch`
99+
* If an error is encountered, print the error message: `ResultReason.Canceled`
100+
101+
```java
102+
switch (result.getReason()) {
103+
case ResultReason.RecognizedSpeech:
104+
System.out.println("We recognized: " + result.getText());
105+
exitCode = 0;
106+
break;
107+
case ResultReason.NoMatch:
108+
System.out.println("NOMATCH: Speech could not be recognized.");
109+
break;
110+
case ResultReason.Canceled: {
111+
CancellationDetails cancellation = CancellationDetails.fromResult(result);
112+
System.out.println("CANCELED: Reason=" + cancellation.getReason());
113+
114+
if (cancellation.getReason() == CancellationReason.Error) {
115+
System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
116+
System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
117+
System.out.println("CANCELED: Did you update the subscription info?");
118+
}
119+
}
120+
break;
121+
}
122+
```
123+
124+
### Continuous recognition
125+
126+
Continuous recognition is a bit more involved than single-shot recognition. It requires you to subscribe to the `recognizing`, `recognized`, and `canceled` events to get the recognition results. To stop recognition, you must call [`stopContinuousRecognitionAsync`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable#stopcontinuousrecognitionasync). Here's an example of how continuous recognition is performed on an audio input file.
127+
128+
Let's start by defining the input and initializing a [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable):
129+
130+
```java
131+
AudioConfig audioConfig = AudioConfig.fromWavFileInput("YourAudioFile.wav");
132+
SpeechRecognizer recognizer = new SpeechRecognizer(config, audioConfig);
133+
```
134+
135+
Next, let's create a variable to manage the state of speech recognition. To start, we'll declare a `Semaphore` at the class scope.
136+
137+
```java
138+
private static Semaphore stopTranslationWithFileSemaphore;
139+
```
140+
141+
Next, we'll subscribe to the events sent from the [`SpeechRecognizer`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable).
142+
143+
* [`recognizing`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer.recognizing?view=azure-java-stable): Signal for events containing intermediate recognition results.
144+
* [`recognized`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer.recognized?view=azure-java-stable): Signal for events containing final recognition results (indicating a successful recognition attempt).
145+
* [`sessionStopped`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.recognizer.sessionstopped?view=azure-java-stable): Signal for events indicating the end of a recognition session (operation).
146+
* [`canceled`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer.canceled?view=azure-java-stable): Signal for events containing canceled recognition results (indicating a recognition attempt that was canceled as a result or a direct cancellation request or, alternatively, a transport or protocol failure).
147+
148+
```java
149+
// First initialize the semaphore.
150+
stopTranslationWithFileSemaphore = new Semaphore(0);
151+
152+
recognizer.recognizing.addEventListener((s, e) -> {
153+
System.out.println("RECOGNIZING: Text=" + e.getResult().getText());
154+
});
155+
156+
recognizer.recognized.addEventListener((s, e) -> {
157+
if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
158+
System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
159+
}
160+
else if (e.getResult().getReason() == ResultReason.NoMatch) {
161+
System.out.println("NOMATCH: Speech could not be recognized.");
162+
}
163+
});
164+
165+
recognizer.canceled.addEventListener((s, e) -> {
166+
System.out.println("CANCELED: Reason=" + e.getReason());
167+
168+
if (e.getReason() == CancellationReason.Error) {
169+
System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
170+
System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
171+
System.out.println("CANCELED: Did you update the subscription info?");
172+
}
173+
174+
stopTranslationWithFileSemaphore.release();
175+
});
176+
177+
recognizer.sessionStopped.addEventListener((s, e) -> {
178+
System.out.println("\n Session stopped event.");
179+
stopTranslationWithFileSemaphore.release();
180+
});
181+
```
182+
183+
With everything set up, we can call [`stopContinuousRecognitionAsync`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer?view=azure-java-stable#startcontinuousrecognitionasync).
184+
185+
```java
186+
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
187+
recognizer.startContinuousRecognitionAsync().get();
188+
189+
// Waits for completion.
190+
stopTranslationWithFileSemaphore.acquire();
191+
192+
// Stops recognition.
193+
recognizer.stopContinuousRecognitionAsync().get();
194+
```
195+
196+
### Dictation mode
197+
198+
When using continuous recognition, you can enable dictation processing by using the corresponding "enable dictation" function. This mode will cause the speech config instance to interpret word descriptions of sentence structures such as punctuation. For example, the utterance "Do you live in town question mark" would be interpreted as the text "Do you live in town?".
199+
200+
To enable dictation mode, use the [`enableDictation`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig.enabledictation?view=azure-java-stable) method on your [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable).
201+
202+
```java
203+
config.enableDictation();
204+
```
205+
206+
## Change source language
207+
208+
A common task for speech recognition is specifying the input (or source) language. Let's take a look at how you would change the input language to German. In your code, find your [`SpeechConfig`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig?view=azure-java-stable), then add this line directly below it.
209+
210+
```java
211+
config.setSpeechRecognitionLanguage("fr-FR");
212+
```
213+
214+
[`setSpeechRecognitionLanguage`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.speechconfig.setspeechrecognitionlanguage?view=azure-java-stable) is a parameter that takes a string as an argument. You can provide any value in the list of supported [locales/languages](../../../language-support.md).
215+
216+
## Improve recognition accuracy
217+
218+
There are a few ways to improve recognition accuracy with the Speech SDK. Let's take a look at Phrase Lists. Phrase Lists are used to identify known phrases in audio data, like a person's name or a specific location. Single words or complete phrases can be added to a Phrase List. During recognition, an entry in a phrase list is used if an exact match for the entire phrase is included in the audio. If an exact match to the phrase is not found, recognition is not assisted.
219+
220+
> [!IMPORTANT]
221+
> The Phrase List feature is only available in English.
222+
223+
To use a phrase list, first create a [`PhraseListGrammar`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.phraselistgrammar?view=azure-java-stable) object, then add specific words and phrases with [`AddPhrase`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.phraselistgrammar.addphrase?view=azure-java-stable#com_microsoft_cognitiveservices_speech_PhraseListGrammar_addPhrase_String_).
224+
225+
Any changes to [`PhraseListGrammar`](https://docs.microsoft.com/java/api/com.microsoft.cognitiveservices.speech.phraselistgrammar?view=azure-java-stable) take effect on the next recognition or after a reconnection to the Speech service.
226+
227+
```java
228+
PhraseListGrammar phraseList = PhraseListGrammar.fromRecognizer(recognizer);
229+
phraseList.addPhrase("Supercalifragilisticexpialidocious");
230+
```
231+
232+
If you need to clear your phrase list:
233+
234+
```java
235+
phraseList.clear();
236+
```
237+
238+
### Other options to improve recognition accuracy
239+
240+
Phrase lists are only one option to improve recognition accuracy. You can also:
241+
242+
* [Improve accuracy with Custom Speech](../../../how-to-custom-speech.md)
243+
* [Improve accuracy with tenant models](../../../tutorial-tenant-model.md)

0 commit comments

Comments
 (0)