Skip to content

Commit c9b2993

Browse files
author
Trevor Bye
committed
adding spx to master
1 parent a70296e commit c9b2993

File tree

7 files changed

+207
-0
lines changed

7 files changed

+207
-0
lines changed

articles/cognitive-services/Speech-Service/index-speech-to-text.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ landingContent:
2424
url: batch-transcription.md
2525
- text: Speech recognition basics
2626
url: speech-to-text-basics.md
27+
- text: Use SPX for speech-to-text with no code
28+
url: spx-overview.md
2729
- linkListType: quickstart
2830
links:
2931
- text: Recognize speech with microphone input

articles/cognitive-services/Speech-Service/index-speech-translation.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ landingContent:
2222
url: speech-translation.md
2323
- text: Speech translation basics
2424
url: speech-translation-basics.md
25+
- text: Use SPX to translate speech with no code
26+
url: spx-overview.md
2527
- linkListType: quickstart
2628
links:
2729
- text: Translate speech-to-text

articles/cognitive-services/Speech-Service/index-text-to-speech.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ landingContent:
2222
url: text-to-speech.md
2323
- text: Speech synthesis basics
2424
url: text-to-speech-basics.md
25+
- text: Use SPX for text-to-speech with no code
26+
url: spx-overview.md
2527
- linkListType: quickstart
2628
links:
2729
- text: Synthesize speech to a speaker

articles/cognitive-services/Speech-Service/index.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,17 @@ conceptualContent:
118118
footerLink:
119119
text: See more
120120
url: index-voice-assistants.yml
121+
- title: Tools
122+
links:
123+
- itemType: overview
124+
text: About SPX - use the Speech service with no code
125+
url: spx-overview.md
126+
- itemType: how-to-guide
127+
text: SPX basics
128+
url: spx-basics.md
129+
- itemType: overview
130+
text: About Speech Studio - no-code Speech service customization
131+
url: https://speech.microsoft.com
121132
- title: Hosting
122133
links:
123134
- itemType: how-to-guide
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: "SPX basics - Speech service"
3+
titleSuffix: Azure Cognitive Services
4+
description: Learn how to use the SPX command line tool to work with the Speech SDK with no code and minimal setup.
5+
services: cognitive-services
6+
author: trevorbye
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: quickstart
11+
ms.date: 04/04/2020
12+
ms.author: trbye
13+
---
14+
15+
# Learn the basics of SPX
16+
17+
In this article, you learn the basic usage patterns of SPX, a command line tool to use the Speech service without writing code. You can quickly test out the main features of the Speech service, without creating development environments or writing any code, to see if your use-cases can be adequately met. Additionally, SPX is production ready and can be used to automate simple workflows in the Speech service, using `.bat` or shell scripts.
18+
19+
## Prerequisites
20+
21+
The only prerequisite is an Azure Speech subscription. See the [guide](get-started.md#new-resource) on creating a new subscription if you don't already have one.
22+
23+
## Download and install
24+
25+
SPX is available on Windows and Linux. Start by downloading the [zip archive](https://aka.ms/speech/spx-zips.zip), then extract it. SPX requires either the .NET Core or .NET Framework runtime, and the following versions are supported by platform:
26+
27+
* Windows: [.NET Framework 4.7](https://dotnet.microsoft.com/download/dotnet-framework/net471), [.NET Core 2.2](https://dotnet.microsoft.com/download/dotnet-core/2.2)
28+
* Linux: [.NET Core 2.2](https://dotnet.microsoft.com/download/dotnet-core/2.2)
29+
30+
After you've installed a runtime, go to the root directory `spx-zips` that you extracted from the download, and extract the subdirectory that you need (`spx-net471`, for example). In a command prompt, change directory to this location, and then run `spx` to start the application.
31+
32+
## Create subscription config
33+
34+
To start using SPX, you first need to enter your Speech subscription key and region information. See the [region support](https://docs.microsoft.com/azure/cognitive-services/speech-service/regions#speech-sdk) page to find your region identifier. Once you have your subscription key and region identifier (ex. `eastus`, `westus`), run the following commands.
35+
36+
```shell
37+
spx config @key --set YOUR-SUBSCRIPTION-KEY
38+
spx config @region --set YOUR-REGION-ID
39+
```
40+
41+
Your subscription authentication is now stored for future SPX requests. If you need to remove either of these stored values, run `spx config @region --clear` or `spx config @key --clear`.
42+
43+
## Basic usage
44+
45+
This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. Start by performing some speech recognition using your default microphone by running the following command.
46+
47+
```shell
48+
spx recognize --microphone
49+
```
50+
51+
After entering the command, SPX will begin listening for audio on the current active input device, and stop after you press `ENTER`. The recorded speech is then recognized and converted to text in the console output. Text-to-speech synthesis is also easy to do using SPX.
52+
53+
Running the following command will take the entered text as input, and output the synthesized speech to the current active output device.
54+
55+
```shell
56+
spx synthesize --text "Testing synthesis using SPX" --speakers
57+
```
58+
59+
In addition to speech recognition and synthesis, you can also do speech translation with SPX. Similar to the speech recognition command above, run the following command to capture audio from your default microphone, and perform translation to text in the target language.
60+
61+
```shell
62+
spx translate --microphone --source en-US --target ru-RU --output file C:\some\file\path\russian_translation.txt
63+
```
64+
65+
In this command, you specify both the source (language to translate **from**), and the target (language to translate **to**) languages. Using the `--microphone` argument will listen to audio on the current active input device, and stop after you press `ENTER`. The output is a text translation to the target language, written to a text file.
66+
67+
> [!NOTE]
68+
> See the [language and locale article](language-support.md) for a list of all supported languages with their corresponding locale codes.
69+
70+
## Batch operations
71+
72+
The commands in the previous section are great for quickly seeing how the Speech service works. However, when assessing whether or not your use-cases can be met, you likely need to perform batch operations against a range of input you already have, to see how the service handles a variety of scenarios. This section shows how to:
73+
74+
* Run batch speech recognition on a directory of audio files
75+
* Iterate through a `.tsv` file and run batch text-to-speech synthesis
76+
77+
## Batch speech recognition
78+
79+
If you have a directory of audio files, it's easy with SPX to quickly run batch-speech recognition. Simply run the following command, pointing to your directory with the `--files` command. In this example, you append `\*.wav` to the directory to recognize all `.wav` files present in the dir. Additionally, specify the `--threads` argument to run the recognition on 10 parallel threads.
80+
81+
> [!NOTE]
82+
> The `--threads` argument can be also used in the next section for `spx synthesize` commands, and the available threads will depend on the CPU and it's current load percentage.
83+
84+
```shell
85+
spx recognize --files C:\your_wav_file_dir\*.wav --output file C:\output_dir\speech_output.tsv --threads 10
86+
```
87+
88+
The recognized speech output is written to `speech_output.tsv` using the `--output file` argument. The following is an example of the output file structure.
89+
90+
audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text
91+
sample_1 07baa2f8d9fd4fbcb9faea451ce05475 A sample wave file.
92+
sample_2 8f9b378f6d0b42f99522f1173492f013 Sample text synthesized.
93+
94+
## Batch text-to-speech synthesis
95+
96+
The easiest way to run batch text-to-speech is to create a new `.tsv` (tab-separated-value) file, and leverage the `--foreach` command in SPX. Consider the following file `text_synthesis.tsv`:
97+
98+
audio.output text
99+
C:\batch_wav_output\wav_1.wav Sample text to synthesize.
100+
C:\batch_wav_output\wav_2.wav Using SPX to run batch-synthesis.
101+
C:\batch_wav_output\wav_3.wav Some more text to test capabilities.
102+
103+
Next, you run a command to point to `text_synthesis.tsv`, perform synthesis on each `text` field, and write the result to the corresponding `audio.output` path as a `.wav` file.
104+
105+
```shell
106+
spx synthesize --foreach in @C:\your\path\to\text_synthesis.tsv
107+
```
108+
109+
This command is the equivalent of running `spx synthesize --text Sample text to synthesize --audio output C:\batch_wav_output\wav_1.wav` **for each** record in the `.tsv` file. A couple things to note:
110+
111+
* The column headers, `audio.output` and `text`, correspond to the command line arguments `--audio output` and `--text`, respectively. Multi-part command line arguments like `--audio output` should be formatted in the file with no spaces, no leading dashes, and periods separating strings, e.g. `audio.output`. Any other existing command line arguments can be added to the file as additional columns using this pattern.
112+
* When the file is formatted in this way, no additional arguments are required to be passed to `--foreach`.
113+
* Ensure to separate each value in the `.tsv` with a **tab**.
114+
115+
However, if you have a `.tsv` file like the following example, with column headers that **do not match** command line arguments:
116+
117+
wav_path str_text
118+
C:\batch_wav_output\wav_1.wav Sample text to synthesize.
119+
C:\batch_wav_output\wav_2.wav Using SPX to run batch-synthesis.
120+
C:\batch_wav_output\wav_3.wav Some more text to test capabilities.
121+
122+
You can override these field names to the correct arguments using the following syntax in the `--foreach` call. This is the same call as above.
123+
124+
```shell
125+
spx synthesize --foreach audio.output;text in @C:\your\path\to\text_synthesis.tsv
126+
```
127+
128+
## Next steps
129+
130+
* Complete the [speech recognition](./quickstarts/speech-to-text-from-microphone.md) or [speech synthesis](./quickstarts/text-to-speech.md) quickstarts using the SDK.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: SPX - Speech service
3+
titleSuffix: Azure Cognitive Services
4+
description: SPX is a command line tool for using the Speech service without writing any code. SPX requires minimal set up, and it's easy to immediately start experimenting with key features of the Speech service to see if your use-cases can be met.
5+
services: cognitive-services
6+
author: trevorbye
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: conceptual
11+
ms.date: 04/14/2020
12+
ms.author: trbye
13+
---
14+
15+
# What is SPX?
16+
17+
SPX is a command line tool for using the Speech service without writing any code. SPX requires minimal setup, and it's easy to immediately start experimenting with key features of the Speech service to see if your use-cases can be met. Within minutes, you can run simple test workflows like batch speech-recognition from a directory of files, or text-to-speech on a collection of strings from a file. Beyond simple workflows, SPX is production-ready and can be scaled up to run larger processes using automated `.bat` or shell scripts.
18+
19+
The majority of the primary features in the Speech SDK are available in SPX, but some advanced features and customizations are simplified in SPX. Consider the following guidance to decide when to use SPX or the SDK.
20+
21+
Use SPX when:
22+
* You want to experiment with Speech service features with minimal setup and no code
23+
* You have relatively simple requirements for a production application using the Speech service
24+
25+
Use the SDK when:
26+
* You want to integrate Speech service functionality within a specific language or platform (e.g. C#, Python, C++)
27+
* You have complex requirements that may require advanced service requests, or developing custom behavior including response streaming
28+
29+
## Core features
30+
31+
* Speech recognition - Convert speech-to-text either from audio files or directly from a microphone, or transcribe a recorded conversation.
32+
33+
* Speech synthesis - Convert text-to-speech using either input from text files, or input directly from the command line. Customize speech output characteristics using [SSML configurations](speech-synthesis-markup.md), and either [standard or neural voices](speech-synthesis-markup.md#standard-neural-and-custom-voices).
34+
35+
* Speech translation - Translate audio in a source language to text in a target language.
36+
37+
* Run on Azure compute resources - Send SPX commands to run on an Azure remote compute resource using `spx webjob`.
38+
39+
## Get started
40+
41+
To get started with SPX, see the [basics article](spx-basics.md). This article shows you how to run some basic commands in SPX, and also shows slightly more advanced commands for running batch operations for speech-to-text and text-to-speech. After reading the basics article, you should have enough of an understanding of the SPX syntax to start writing some custom commands, or automating simple Speech operations.
42+
43+
## Next steps
44+
45+
- [SPX basics](spx-basics.md)
46+
- If your use-case is more complex, [get the Speech SDK](speech-sdk.md)

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -539,6 +539,20 @@
539539
items:
540540
- name: Speech devices SDK release notes
541541
href: devices-sdk-release-notes.md
542+
- name: Tools
543+
items:
544+
- name: SPX
545+
items:
546+
- name: What is SPX?
547+
href: spx-overview.md
548+
- name: SPX basics
549+
href: spx-basics.md
550+
- name: Speech Studio
551+
items:
552+
- name: What is Speech Studio?
553+
href: https://speech.microsoft.com
554+
- name: Create a Custom Commands app with Speech Studio
555+
href: quickstart-custom-speech-commands-create-new.md
542556
- name: Migration
543557
items:
544558
- name: From Bing Speech

0 commit comments

Comments
 (0)