Skip to content

Commit eea5c6e

Browse files
committed
#1674741 updating docs to reflect audio format support.
1 parent 57ae3c0 commit eea5c6e

15 files changed

+385
-268
lines changed

.openpublishing.redirection.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49200,6 +49200,16 @@
4920049200
"source_path": "articles/cognitive-services/Speech-Service/sapi-phoneset-usage.md",
4920149201
"redirect_url": "/azure/cognitive-services/speech-service/speech-ssml-phonetic-sets",
4920249202
"redirect_document_id": false
49203+
},
49204+
{
49205+
"source_path": "articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-android.md",
49206+
"redirect_url": "/azure/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams?pivots=programming-language-java",
49207+
"redirect_document_id": false
49208+
},
49209+
{
49210+
"source_path": "articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-ios.md",
49211+
"redirect_url": "/azure/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams?pivots=programming-language-objectivec",
49212+
"redirect_document_id": false
4920349213
}
4920449214
]
4920549215
}

articles/cognitive-services/Speech-Service/how-to-custom-speech-test-data.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 12/17/2019
11+
ms.date: 03/09/2020
1212
ms.author: erhopf
1313
---
1414

@@ -60,6 +60,8 @@ Use this table to ensure that your audio files are formatted correctly for use w
6060
| Archive format | .zip |
6161
| Maximum archive size | 2 GB |
6262

63+
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
64+
6365
> [!TIP]
6466
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. If you require more data for training, divide it into several .zip files and upload them separately. Later, you can choose to train from *multiple* datasets. However, you can only test from a *single* dataset.
6567
@@ -84,8 +86,10 @@ To measure the accuracy of Microsoft's speech-to-text accuracy when processing y
8486
| Archive format | .zip |
8587
| Maximum zip size | 2 GB |
8688

89+
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
90+
8791
> [!NOTE]
88-
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. Uou can only test from a *single* dataset, be sure to keep it within the appropriate file size.
92+
> When uploading training and testing data, the .zip file size cannot exceed 2 GB. You can only test from a *single* dataset, be sure to keep it within the appropriate file size. Additionally, each training file cannot exceed 60 seconds otherwise it will error out.
8993
9094
To address issues like word deletion or substitution, a significant amount of data is required to improve recognition. Generally, it's recommended to provide word-by-word transcriptions for roughly 10 to 1,000 hours of audio. The transcriptions for all WAV files should be contained in a single plain-text file. Each line of the transcription file should contain the name of one of the audio files, followed by the corresponding transcription. The file name and transcription should be separated by a tab (\t).
9195

articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-android.md

Lines changed: 0 additions & 159 deletions
This file was deleted.

articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams-ios.md

Lines changed: 0 additions & 66 deletions
This file was deleted.

articles/cognitive-services/Speech-Service/how-to-use-codec-compressed-audio-input-streams.md

Lines changed: 32 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,67 +8,58 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 09/20/2019
11+
ms.date: 03/09/2020
1212
ms.author: amishu
13+
zone_pivot_groups: programming-languages-set-twelve
1314
---
1415

15-
# Using codec compressed audio input with the Speech SDK
16+
# Use codec compressed audio input with the Speech SDK
1617

17-
The Speech SDK's **Compressed Audio Input Stream** API provides a way to stream compressed audio to the Speech service using PullStream or PushStream.
18+
The Speech service SDK **Compressed Audio Input Stream** API provides a way to stream compressed audio to the Speech service using either a `PullStream` or `PushStream`.
1819

1920
> [!IMPORTANT]
20-
> Streaming compressed input audio is currently supported for C++, C#, and Java on Linux (Ubuntu 16.04, Ubuntu 18.04, Debian 9, RHEL 8, CentOS 8). It is also supported for [Java in Android](how-to-use-codec-compressed-audio-input-streams-android.md) and [Objective-C in iOS](how-to-use-codec-compressed-audio-input-streams-ios.md) platform.
21+
> Streaming compressed input audio is currently supported for C#, C++, Java on Linux (Ubuntu 16.04, Ubuntu 18.04, Debian 9, RHEL 8, CentOS 8). It is also supported for Java in Android and Objective-C in iOS platform.
2122
> Speech SDK version 1.7.0 or higher is required (version 1.10.0 or higher for RHEL 8, CentOS 8).
2223
23-
For wav/PCM see the mainline speech documentation. Outside of wav/PCM, the following codec compressed input formats are supported:
24-
25-
- MP3
26-
- OPUS/OGG
27-
- FLAC
28-
- ALAW in wav container
29-
- MULAW in wav container
24+
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
3025

3126
## Prerequisites
3227

33-
Handling compressed audio is implemented using [GStreamer](https://gstreamer.freedesktop.org). For licensing reason Gstreamer binaries are not compiled and linked with speech SDK. So application developer needs to install the following on 18.04, 16.04 and Debian 9 to use compressed input audio.
34-
35-
```sh
36-
sudo apt install libgstreamer1.0-0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly
37-
```
28+
::: zone pivot="programming-language-csharp"
29+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/csharp/prerequisites.md)]
30+
::: zone-end
3831

39-
On RHEL/CentOS 8:
32+
::: zone pivot="programming-language-cpp"
33+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/cpp/prerequisites.md)]
34+
::: zone-end
4035

41-
```sh
42-
sudo yum install gstreamer1 gstreamer1-plugins-base gstreamer1-plugins-good gstreamer1-plugins-bad-free gstreamer1-plugins-ugly-free
43-
```
36+
::: zone pivot="programming-language-java"
37+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/java/prerequisites.md)]
38+
::: zone-end
4439

45-
> [!NOTE]
46-
> On RHEL/CentOS 8, follow the instructions on [how to configure OpenSSL for Linux](~/articles/cognitive-services/speech-service/how-to-configure-openssl-linux.md).
40+
::: zone pivot="programming-language-objectivec"
41+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/objectivec/prerequisites.md)]
42+
::: zone-end
4743

4844
## Example code using codec compressed audio input
4945

50-
To stream in a compressed audio format to the Speech service, create `PullAudioInputStream` or `PushAudioInputStream`. Then, create an `AudioConfig` from an instance of your stream class, specifying the compression format of the stream.
51-
52-
Let's assume that you have an input stream class called `myPushStream` and are using OPUS/OGG. Your code may look like this:
53-
54-
```csharp
55-
using Microsoft.CognitiveServices.Speech;
56-
using Microsoft.CognitiveServices.Speech.Audio;
57-
58-
var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");
59-
60-
// Create an audio config specifying the compressed audio format and the instance of your input stream class.
61-
var audioFormat = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.OGG_OPUS);
62-
var audioConfig = AudioConfig.FromStreamInput(myPushStream, audioFormat);
46+
::: zone pivot="programming-language-csharp"
47+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/csharp/examples.md)]
48+
::: zone-end
6349

64-
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
50+
::: zone pivot="programming-language-cpp"
51+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/cpp/examples.md)]
52+
::: zone-end
6553

66-
var result = await recognizer.RecognizeOnceAsync();
54+
::: zone pivot="programming-language-java"
55+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/java/examples.md)]
56+
::: zone-end
6757

68-
var text = result.GetText();
69-
```
58+
::: zone pivot="programming-language-objectivec"
59+
[!INCLUDE [prerequisites](includes/how-tos/compressed-audio-input/objectivec/examples.md)]
60+
::: zone-end
7061

7162
## Next steps
7263

73-
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
74-
* [See how to recognize speech in Java](~/articles/cognitive-services/Speech-Service/quickstarts/speech-to-text-from-microphone.md?pivots=programming-language-java)
64+
> [!div class="nextstepaction"]
65+
> [Learn how to recognize speech](quickstarts/speech-to-text-from-microphone.md)

0 commit comments

Comments
 (0)