|
| 1 | +--- |
| 2 | +title: 'Quickstart: Custom voice-first virtual assistant (Preview), Java (Android) - Speech Services' |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Learn how to create a voice-first virtual assistant application in Java on Android using the Speech SDK |
| 5 | +services: cognitive-services |
| 6 | +author: trrwilson |
| 7 | +manager: nitinme |
| 8 | +ms.service: cognitive-services |
| 9 | +ms.subservice: speech-service |
| 10 | +ms.topic: quickstart |
| 11 | +ms.date: 5/24/2019 |
| 12 | +ms.author: travisw |
| 13 | +--- |
| 14 | + |
| 15 | +# Quickstart: Create a voice-first virtual assistant in Java on Android by using the Speech SDK |
| 16 | + |
| 17 | +A quickstart is also available for [speech-to-text](quickstart-java-android.md). |
| 18 | + |
| 19 | +In this article, you'll build a voice-first virtual assistant with Java for Android using the [Speech SDK](speech-sdk.md). This application will connect to a bot that you've already authored and configured with the [Direct Line Speech channel](https://docs.microsoft.com/azure/bot-service/bot-service-channel-connect-directlinespeech). It will then send a voice request to the bot and present a voice-enabled response activity. |
| 20 | + |
| 21 | +This application is built with the Speech SDK Maven package and Android Studio 3.3. The Speech SDK is currently compatible with Android devices having 32/64-bit ARM and Intel x86/x64 compatible processors. |
| 22 | + |
| 23 | +> [!NOTE] |
| 24 | +> For the Speech Devices SDK and the Roobo device, see [Speech Devices SDK](speech-devices-sdk.md). |
| 25 | +
|
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +* An Azure subscription key for Speech Services in the **westus2** region. Create this subscription on the [Azure portal](https://portal.azure.com). |
| 29 | +* A previously created bot configured with the [Direct Line Speech channel](https://docs.microsoft.com/azure/bot-service/bot-service-channel-connect-directlinespeech) |
| 30 | +* [Android Studio](https://developer.android.com/studio/) v3.3 or later |
| 31 | + |
| 32 | + > [!NOTE] |
| 33 | + > Direct Line Speech (Preview) is currently only available in the **westus2** region. |
| 34 | +
|
| 35 | + > [!NOTE] |
| 36 | + > The 30-day trial for the standard pricing tier described in [Try Speech Services for free](get-started.md) is restricted to **westus** (not **westus2**) and is thus not compatible with Direct Line Speech. Free and standard tier **westus2** subscriptions are compatible. |
| 37 | +
|
| 38 | +## Create and configure a project |
| 39 | + |
| 40 | +[!INCLUDE [](../../../includes/cognitive-services-speech-service-quickstart-java-android-create-proj.md)] |
| 41 | + |
| 42 | +## Create user interface |
| 43 | + |
| 44 | +In this section, we'll create a basic user interface (UI) for the application. Let's start by opening the main activity: `activity_main.xml`. The basic template includes a title bar with the application's name, and a `TextView` with the message "Hello world!". |
| 45 | + |
| 46 | +Next, replace the contents of the `activity_main.xml` with the following code: |
| 47 | + |
| 48 | + ```xml |
| 49 | + <?xml version="1.0" encoding="utf-8"?> |
| 50 | + <LinearLayout xmlns:android="http://schemas.android.com/apk/res/android" |
| 51 | + xmlns:tools="http://schemas.android.com/tools" |
| 52 | + android:layout_width="match_parent" |
| 53 | + android:layout_height="match_parent" |
| 54 | + android:orientation="vertical" |
| 55 | + tools:context=".MainActivity"> |
| 56 | + |
| 57 | + <Button |
| 58 | + android:id="@+id/button" |
| 59 | + android:layout_width="wrap_content" |
| 60 | + android:layout_height="wrap_content" |
| 61 | + android:layout_gravity="center" |
| 62 | + android:onClick="onBotButtonClicked" |
| 63 | + android:text="Talk to your bot" /> |
| 64 | + |
| 65 | + <TextView |
| 66 | + android:layout_width="match_parent" |
| 67 | + android:layout_height="wrap_content" |
| 68 | + android:text="Recognition Data" |
| 69 | + android:textSize="18dp" |
| 70 | + android:textStyle="bold" /> |
| 71 | + |
| 72 | + <TextView |
| 73 | + android:id="@+id/recoText" |
| 74 | + android:layout_width="match_parent" |
| 75 | + android:layout_height="wrap_content" |
| 76 | + android:text=" \n(Recognition goes here)\n" /> |
| 77 | + |
| 78 | + <TextView |
| 79 | + android:layout_width="match_parent" |
| 80 | + android:layout_height="wrap_content" |
| 81 | + android:text="Activity Data" |
| 82 | + android:textSize="18dp" |
| 83 | + android:textStyle="bold" /> |
| 84 | + |
| 85 | + <TextView |
| 86 | + android:id="@+id/activityText" |
| 87 | + android:layout_width="match_parent" |
| 88 | + android:layout_height="match_parent" |
| 89 | + android:scrollbars="vertical" |
| 90 | + android:text=" \n(Activities go here)\n" /> |
| 91 | + |
| 92 | + </LinearLayout> |
| 93 | + ``` |
| 94 | + |
| 95 | +This XML defines a simple UI to interact with your bot. |
| 96 | + |
| 97 | +* The `button` element initiates an interaction and invokes the `onBotButtonClicked` method when clicked. |
| 98 | +* The `recoText` element will display the speech-to-text results as you talk to your bot. |
| 99 | +* The `activityText` element will display the JSON payload for the latest Bot Framework activity from your bot. |
| 100 | + |
| 101 | +The text and graphical representation of your UI should now look like this: |
| 102 | + |
| 103 | + |
| 104 | + |
| 105 | +## Add sample code |
| 106 | + |
| 107 | +1. Open `MainActivity.java`, and replace the contents with the following code: |
| 108 | + |
| 109 | + ```java |
| 110 | + package samples.speech.cognitiveservices.microsoft.com; |
| 111 | + |
| 112 | + import android.media.AudioFormat; |
| 113 | + import android.media.AudioManager; |
| 114 | + import android.media.AudioTrack; |
| 115 | + import android.support.v4.app.ActivityCompat; |
| 116 | + import android.support.v7.app.AppCompatActivity; |
| 117 | + import android.os.Bundle; |
| 118 | + import android.text.method.ScrollingMovementMethod; |
| 119 | + import android.view.View; |
| 120 | + import android.widget.TextView; |
| 121 | + |
| 122 | + import com.microsoft.cognitiveservices.speech.audio.AudioConfig; |
| 123 | + import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream; |
| 124 | + import com.microsoft.cognitiveservices.speech.dialog.BotConnectorConfig; |
| 125 | + import com.microsoft.cognitiveservices.speech.dialog.SpeechBotConnector; |
| 126 | + |
| 127 | + import org.json.JSONException; |
| 128 | + import org.json.JSONObject; |
| 129 | + |
| 130 | + import static android.Manifest.permission.*; |
| 131 | + |
| 132 | + public class MainActivity extends AppCompatActivity { |
| 133 | + // Replace below with your bot's own Direct Line Speech channel secret |
| 134 | + private static String channelSecret = "YourChannelSecret"; |
| 135 | + // Replace below with your own speech subscription key |
| 136 | + private static String speechSubscriptionKey = "YourSpeechSubscriptionKey"; |
| 137 | + // Replace below with your own speech service region (note: only 'westus2' is currently supported) |
| 138 | + private static String serviceRegion = "YourSpeechServiceRegion"; |
| 139 | + |
| 140 | + private SpeechBotConnector botConnector; |
| 141 | + |
| 142 | + @Override |
| 143 | + protected void onCreate(Bundle savedInstanceState) { |
| 144 | + super.onCreate(savedInstanceState); |
| 145 | + setContentView(R.layout.activity_main); |
| 146 | + |
| 147 | + TextView recoText = (TextView) this.findViewById(R.id.recoText); |
| 148 | + TextView activityText = (TextView) this.findViewById(R.id.activityText); |
| 149 | + recoText.setMovementMethod(new ScrollingMovementMethod()); |
| 150 | + activityText.setMovementMethod(new ScrollingMovementMethod()); |
| 151 | + |
| 152 | + // Note: we need to request permissions for audio input and network access |
| 153 | + int requestCode = 5; // unique code for the permission request |
| 154 | + ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode); |
| 155 | + } |
| 156 | + |
| 157 | + public void onBotButtonClicked(View v) { |
| 158 | + // Recreate the SpeechBotConnector on each button press, ensuring that the existing one is closed |
| 159 | + if (botConnector != null) { |
| 160 | + botConnector.close(); |
| 161 | + botConnector = null; |
| 162 | + } |
| 163 | + |
| 164 | + // Create the SpeechBotConnector from the channel and speech subscription information |
| 165 | + BotConnectorConfig config = BotConnectorConfig.fromSecretKey(channelSecret, speechSubscriptionKey, serviceRegion); |
| 166 | + botConnector = new SpeechBotConnector(config, AudioConfig.fromDefaultMicrophoneInput()); |
| 167 | + |
| 168 | + // Optional step: preemptively connect to reduce first interaction latency |
| 169 | + botConnector.connectAsync(); |
| 170 | + |
| 171 | + // Register the SpeechBotConnector's event listeners |
| 172 | + registerEventListeners(); |
| 173 | + |
| 174 | + // Begin sending audio to your bot |
| 175 | + botConnector.listenOnceAsync(); |
| 176 | + } |
| 177 | + |
| 178 | + private void registerEventListeners() { |
| 179 | + TextView recoText = (TextView) this.findViewById(R.id.recoText); // 'recoText' is the ID of your text view |
| 180 | + TextView activityText = (TextView) this.findViewById(R.id.activityText); // 'activityText' is the ID of your text view |
| 181 | + |
| 182 | + // Recognizing will provide the intermediate recognized text while an audio stream is being processed |
| 183 | + botConnector.recognizing.addEventListener((o, recoArgs) -> { |
| 184 | + recoText.setText(" Recognizing: " + recoArgs.getResult().getText()); |
| 185 | + }); |
| 186 | + |
| 187 | + // Recognized will provide the final recognized text once audio capture is completed |
| 188 | + botConnector.recognized.addEventListener((o, recoArgs) -> { |
| 189 | + recoText.setText(" Recognized: " + recoArgs.getResult().getText()); |
| 190 | + }); |
| 191 | + |
| 192 | + // SessionStarted will notify when audio begins flowing to the service for a turn |
| 193 | + botConnector.sessionStarted.addEventListener((o, sessionArgs) -> { |
| 194 | + recoText.setText("Listening..."); |
| 195 | + }); |
| 196 | + |
| 197 | + // SessionStopped will notify when a turn is complete and it's safe to begin listening again |
| 198 | + botConnector.sessionStopped.addEventListener((o, sessionArgs) -> { |
| 199 | + }); |
| 200 | + |
| 201 | + // Canceled will be signaled when a turn is aborted or experiences an error condition |
| 202 | + botConnector.canceled.addEventListener((o, canceledArgs) -> { |
| 203 | + recoText.setText("Canceled (" + canceledArgs.getReason().toString() + ") error details: {}" + canceledArgs.getErrorDetails()); |
| 204 | + botConnector.disconnectAsync(); |
| 205 | + }); |
| 206 | + |
| 207 | + // ActivityReceived is the main way your bot will communicate with the client and uses bot framework activities. |
| 208 | + botConnector.activityReceived.addEventListener((o, activityArgs) -> { |
| 209 | + try { |
| 210 | + // Here we use JSONObject only to "pretty print" the condensed Activity JSON |
| 211 | + String rawActivity = activityArgs.getActivity().serialize(); |
| 212 | + String formattedActivity = new JSONObject(rawActivity).toString(2); |
| 213 | + activityText.setText(formattedActivity); |
| 214 | + } catch (JSONException e) { |
| 215 | + activityText.setText("Couldn't format activity text: " + e.getMessage()); |
| 216 | + } |
| 217 | + |
| 218 | + if (activityArgs.hasAudio()) { |
| 219 | + // Text-to-speech audio associated with the activity is 16 kHz 16-bit mono PCM data |
| 220 | + final int sampleRate = 16000; |
| 221 | + int bufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT); |
| 222 | + |
| 223 | + AudioTrack track = new AudioTrack( |
| 224 | + AudioManager.STREAM_MUSIC, |
| 225 | + sampleRate, |
| 226 | + AudioFormat.CHANNEL_OUT_MONO, |
| 227 | + AudioFormat.ENCODING_PCM_16BIT, |
| 228 | + bufferSize, |
| 229 | + AudioTrack.MODE_STREAM); |
| 230 | + |
| 231 | + track.play(); |
| 232 | + |
| 233 | + PullAudioOutputStream stream = activityArgs.getAudio(); |
| 234 | + |
| 235 | + // Audio is streamed as it becomes available. Play it as it arrives. |
| 236 | + byte[] buffer = new byte[bufferSize]; |
| 237 | + long bytesRead = 0; |
| 238 | + |
| 239 | + do { |
| 240 | + bytesRead = stream.read(buffer); |
| 241 | + track.write(buffer, 0, (int) bytesRead); |
| 242 | + } while (bytesRead == bufferSize); |
| 243 | + |
| 244 | + track.release(); |
| 245 | + } |
| 246 | + }); |
| 247 | + } |
| 248 | + } |
| 249 | + ``` |
| 250 | + |
| 251 | + * The `onCreate` method includes code that requests microphone and internet permissions. |
| 252 | + |
| 253 | + * The method `onBotButtonClicked` is, as noted earlier, the button click handler. A button press triggers a single interaction ("turn") with your bot. |
| 254 | + |
| 255 | + * The `registerEventListeners` method demonstrates the events used by the SpeechBotConnector and basic handling of incoming activities. |
| 256 | + |
| 257 | +1. In the same file, replace the configuration strings to match your resources: |
| 258 | + |
| 259 | + * Replace `YourChannelSecret` with the Direct Line Speech channel secret for your bot. |
| 260 | + |
| 261 | + * Replace `YourSpeechSubscriptionKey` with your subscription key. |
| 262 | + |
| 263 | + * Replace `YourServiceRegion` with the [region](regions.md) associated with your subscription (Note: only 'westus2' is currently supported). |
| 264 | + |
| 265 | +## Build and run the app |
| 266 | + |
| 267 | +1. Connect your Android device to your development PC. Make sure you have enabled [development mode and USB debugging](https://developer.android.com/studio/debug/dev-options) on the device. |
| 268 | + |
| 269 | +1. To build the application, press Ctrl+F9, or choose **Build** > **Make Project** from the menu bar. |
| 270 | + |
| 271 | +1. To launch the application, press Shift+F10, or choose **Run** > **Run 'app'**. |
| 272 | + |
| 273 | +1. In the deployment target window that appears, choose your Android device. |
| 274 | + |
| 275 | +  |
| 276 | + |
| 277 | +Once the application and its activity have launched, click the button to begin talking to your bot. Transcribed text will appear as you speak and the latest activity have you received from your bot will appear when it is received. If your bot is configured to provide spoken responses, the speech-to-text will automatically play. |
| 278 | + |
| 279 | + |
| 280 | + |
| 281 | +## Next steps |
| 282 | + |
| 283 | +> [!div class="nextstepaction"] |
| 284 | +> [Explore Java samples on GitHub](https://aka.ms/csspeech/samples) |
| 285 | +> [Connect Direct Line Speech to your bot](https://docs.microsoft.com/azure/bot-service/bot-service-channel-connect-directlinespeech) |
| 286 | +
|
| 287 | +## See also |
| 288 | +- [About voice-first virtual assistants](voice-first-virtual-assistants.md) |
| 289 | +- [Custom wake words](speech-devices-sdk-create-kws.md) |
0 commit comments