|
| 1 | +--- |
| 2 | +title: "SPX basics - Speech service" |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Learn how to use the SPX command line tool to work with the Speech SDK with no code and minimal setup. |
| 5 | +services: cognitive-services |
| 6 | +author: trevorbye |
| 7 | +manager: nitinme |
| 8 | +ms.service: cognitive-services |
| 9 | +ms.subservice: speech-service |
| 10 | +ms.topic: quickstart |
| 11 | +ms.date: 04/04/2020 |
| 12 | +ms.author: trbye |
| 13 | +--- |
| 14 | + |
| 15 | +# Learn the basics of SPX |
| 16 | + |
| 17 | +In this article, you learn the basic usage patterns of SPX, a command line tool to use the Speech service without writing code. You can quickly test out the main features of the Speech service, without creating development environments or writing any code, to see if your use-cases can be adequately met. Additionally, SPX is production ready and can be used to automate simple workflows in the Speech service, using `.bat` or shell scripts. |
| 18 | + |
| 19 | +## Prerequisites |
| 20 | + |
| 21 | +The only prerequisite is an Azure Speech subscription. See the [guide](get-started.md#new-resource) on creating a new subscription if you don't already have one. |
| 22 | + |
| 23 | +## Download and install |
| 24 | + |
| 25 | +SPX is available on Windows and Linux. Start by downloading the [zip archive](https://aka.ms/speech/spx-zips.zip), then extract it. SPX requires either the .NET Core or .NET Framework runtime, and the following versions are supported by platform: |
| 26 | + |
| 27 | +* Windows: [.NET Framework 4.7](https://dotnet.microsoft.com/download/dotnet-framework/net471), [.NET Core 2.2](https://dotnet.microsoft.com/download/dotnet-core/2.2) |
| 28 | +* Linux: [.NET Core 2.2](https://dotnet.microsoft.com/download/dotnet-core/2.2) |
| 29 | + |
| 30 | +After you've installed a runtime, go to the root directory `spx-zips` that you extracted from the download, and extract the subdirectory that you need (`spx-net471`, for example). In a command prompt, change directory to this location, and then run `spx` to start the application. |
| 31 | + |
| 32 | +## Create subscription config |
| 33 | + |
| 34 | +To start using SPX, you first need to enter your Speech subscription key and region information. See the [region support](https://docs.microsoft.com/azure/cognitive-services/speech-service/regions#speech-sdk) page to find your region identifier. Once you have your subscription key and region identifier (ex. `eastus`, `westus`), run the following commands. |
| 35 | + |
| 36 | +```shell |
| 37 | +spx config @key --set YOUR-SUBSCRIPTION-KEY |
| 38 | +spx config @region --set YOUR-REGION-ID |
| 39 | +``` |
| 40 | + |
| 41 | +Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run `spx config @region --clear` or `spx config @key --clear`. |
| 42 | + |
| 43 | +## Basic usage |
| 44 | + |
| 45 | +This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. Start by performing some speech recognition using your default microphone by running the following command. |
| 46 | + |
| 47 | +```shell |
| 48 | +spx recognize --microphone |
| 49 | +``` |
| 50 | + |
| 51 | +After entering the command, SPX will begin listening for audio on the current active input device, and stop after you press `ENTER`. The recorded speech is then recognized and converted to text in the console output. Text-to-speech synthesis is also easy to do using SPX. |
| 52 | + |
| 53 | +Running the following command will take the entered text as input, and output the synthesized speech to the current active output device. |
| 54 | + |
| 55 | +```shell |
| 56 | +spx synthesize --text "Testing synthesis using SPX" --speakers |
| 57 | +``` |
| 58 | + |
| 59 | +In addition to speech recognition and synthesis, you can also do speech translation with SPX. Similar to the speech recognition command above, run the following command to capture audio from your default microphone, and perform translation to text in the target language. |
| 60 | + |
| 61 | +```shell |
| 62 | +spx translate --microphone --source en-US --target ru-RU --output file C:\some\file\path\russian_translation.txt |
| 63 | +``` |
| 64 | + |
| 65 | +In this command, you specify both the source (language to translate **from**), and the target (language to translate **to**) languages. Using the `--microphone` argument will listen to audio on the current active input device, and stop after you press `ENTER`. The output is a text translation to the target language, written to a text file. |
| 66 | + |
| 67 | +> [!NOTE] |
| 68 | +> See the [language and locale article](language-support.md) for a list of all supported languages with their corresponding locale codes. |
| 69 | +
|
| 70 | +## Batch operations |
| 71 | + |
| 72 | +The commands in the previous section are great for quickly seeing how the Speech service works. However, when assessing whether or not your use-cases can be met, you likely need to perform batch operations against a range of input you already have, to see how the service handles a variety of scenarios. This section shows how to: |
| 73 | + |
| 74 | +* Run batch speech recognition on a directory of audio files |
| 75 | +* Iterate through a `.tsv` file and run batch text-to-speech synthesis |
| 76 | + |
| 77 | +## Batch speech recognition |
| 78 | + |
| 79 | +If you have a directory of audio files, it's easy with SPX to quickly run batch-speech recognition. Simply run the following command, pointing to your directory with the `--files` command. In this example, you append `\*.wav` to the directory to recognize all `.wav` files present in the dir. Additionally, specify the `--threads` argument to run the recognition on 10 parallel threads. |
| 80 | + |
| 81 | +> [!NOTE] |
| 82 | +> The `--threads` argument can be also used in the next section for `spx synthesize` commands, and the available threads will depend on the CPU and it's current load percentage. |
| 83 | +
|
| 84 | +```shell |
| 85 | +spx recognize --files C:\your_wav_file_dir\*.wav --output file C:\output_dir\speech_output.tsv --threads 10 |
| 86 | +``` |
| 87 | + |
| 88 | +The recognized speech output is written to `speech_output.tsv` using the `--output file` argument. The following is an example of the output file structure. |
| 89 | + |
| 90 | + audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text |
| 91 | + sample_1 07baa2f8d9fd4fbcb9faea451ce05475 A sample wave file. |
| 92 | + sample_2 8f9b378f6d0b42f99522f1173492f013 Sample text synthesized. |
| 93 | + |
| 94 | +## Batch text-to-speech synthesis |
| 95 | + |
| 96 | +The easiest way to run batch text-to-speech is to create a new `.tsv` (tab-separated-value) file, and leverage the `--foreach` command in SPX. Consider the following file `text_synthesis.tsv`: |
| 97 | + |
| 98 | + audio.output text |
| 99 | + C:\batch_wav_output\wav_1.wav Sample text to synthesize. |
| 100 | + C:\batch_wav_output\wav_2.wav Using SPX to run batch-synthesis. |
| 101 | + C:\batch_wav_output\wav_3.wav Some more text to test capabilities. |
| 102 | + |
| 103 | + Next, you run a command to point to `text_synthesis.tsv`, perform synthesis on each `text` field, and write the result to the corresponding `audio.output` path as a `.wav` file. |
| 104 | + |
| 105 | +```shell |
| 106 | +spx synthesize --foreach in @C:\your\path\to\text_synthesis.tsv |
| 107 | +``` |
| 108 | + |
| 109 | +This command is the equivalent of running `spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wav` **for each** record in the `.tsv` file. A couple things to note: |
| 110 | + |
| 111 | +* The column headers, `audio.output` and `text`, correspond to the command line arguments `--audio output` and `--text`, respectively. Multi-part command line arguments like `--audio output` should be formatted in the file with no spaces, no leading dashes, and periods separating strings, e.g. `audio.output`. Any other existing command line arguments can be added to the file as additional columns using this pattern. |
| 112 | +* When the file is formatted in this way, no additional arguments are required to be passed to `--foreach`. |
| 113 | +* Ensure to separate each value in the `.tsv` with a **tab**. |
| 114 | + |
| 115 | +However, if you have a `.tsv` file like the following example, with column headers that **do not match** command line arguments: |
| 116 | + |
| 117 | + wav_path str_text |
| 118 | + C:\batch_wav_output\wav_1.wav Sample text to synthesize. |
| 119 | + C:\batch_wav_output\wav_2.wav Using SPX to run batch-synthesis. |
| 120 | + C:\batch_wav_output\wav_3.wav Some more text to test capabilities. |
| 121 | + |
| 122 | +You can override these field names to the correct arguments using the following syntax in the `--foreach` call. This is the same call as above. |
| 123 | + |
| 124 | +```shell |
| 125 | +spx synthesize --foreach audio.output;text in @C:\your\path\to\text_synthesis.tsv |
| 126 | +``` |
| 127 | + |
| 128 | +## Next steps |
| 129 | + |
| 130 | +* Complete the [speech recognition](./quickstarts/speech-to-text-from-microphone.md) or [speech synthesis](./quickstarts/text-to-speech.md) quickstarts using the SDK. |
0 commit comments