Skip to content

Commit 5bfb4b4

Browse files
authored
Merge pull request #196093 from eric-urban/eur/compressed-audio
compressed-audio
2 parents b6f950c + 9f8d3db commit 5bfb4b4

33 files changed

+374
-207
lines changed

articles/cognitive-services/Speech-Service/captioning-concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Captioning can accompany real time or pre-recorded speech. Whether you're showin
3838

3939
For real time captioning, use a microphone or audio input stream instead of file input. For examples of how to recognize speech from a microphone, see the [Speech to text quickstart](get-started-speech-to-text.md) and [How to recognize speech](how-to-recognize-speech.md) documentation. For more information about streaming, see [How to use the audio input stream](how-to-use-audio-input-streams.md).
4040

41-
For captioning of a prerecoding, send file input to the Speech service. For more information, see [How to use compressed audio files](how-to-use-codec-compressed-audio-input-streams.md).
41+
For captioning of a prerecoding, send file input to the Speech service. For more information, see [How to use compressed input audio](how-to-use-codec-compressed-audio-input-streams.md).
4242

4343
## Caption and speech synchronization
4444

Lines changed: 25 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,125 +1,63 @@
11
---
2-
title: How to use compressed audio files with the Speech SDK - Speech service
2+
title: How to use compressed input audio - Speech service
33
titleSuffix: Azure Cognitive Services
4-
description: Learn how to use compressed audio files to the Speech service with the Speech SDK.
4+
description: Learn how to use compressed input audio the Speech SDK and CLI.
55
services: cognitive-services
66
author: eric-urban
77
ms.author: eur
88
manager: nitinme
99
ms.service: cognitive-services
1010
ms.subservice: speech-service
1111
ms.topic: how-to
12-
ms.date: 01/13/2022
12+
ms.date: 04/25/2022
1313
ms.devlang: cpp, csharp, golang, java, python
1414
ms.custom: devx-track-csharp
15-
zone_pivot_groups: programming-languages-set-twenty-eight
15+
zone_pivot_groups: programming-languages-speech-services
1616
---
1717

18-
# How to use compressed audio files
19-
20-
The Speech SDK and Speech CLI use GStreamer to support different kinds of input audio formats. GStreamer decompresses the audio before it's sent over the wire to the Speech service as raw PCM.
21-
22-
[!INCLUDE [supported-audio-formats](includes/supported-audio-formats.md)]
23-
24-
## Install GStreamer
25-
26-
Choose a platform for installation instructions.
27-
28-
Platform | Languages | Supported GStreamer version
29-
| :--- | ---: | :---:
30-
Android | Java | [1.18.3](https://gstreamer.freedesktop.org/data/pkg/android/1.18.3/)
31-
Linux | C++, C#, Java, Python, Go | [Supported Linux distributions and target architectures](~/articles/cognitive-services/speech-service/speech-sdk.md)
32-
Windows (excluding UWP) | C++, C#, Java, Python | [1.18.3](https://gstreamer.freedesktop.org/data/pkg/windows/1.18.3/msvc/gstreamer-1.0-msvc-x86_64-1.18.3.msi)
33-
34-
### [Android](#tab/android)
35-
36-
For more information about building libgstreamer_android.so, see [GStreamer configuration by programming language](#gstreamer-configuration).
37-
38-
For more information, see [Android installation instructions](https://gstreamer.freedesktop.org/documentation/installing/for-android-development.html?gi-language=c).
39-
40-
### [Linux](#tab/linux)
41-
42-
For more information, see [Linux installation instructions](https://gstreamer.freedesktop.org/documentation/installing/on-linux.html?gi-language=c).
43-
44-
```sh
45-
sudo apt install libgstreamer1.0-0 \
46-
gstreamer1.0-plugins-base \
47-
gstreamer1.0-plugins-good \
48-
gstreamer1.0-plugins-bad \
49-
gstreamer1.0-plugins-ugly
50-
```
51-
### [Windows](#tab/windows)
52-
53-
Make sure that packages of the same platform (x64 or x86) are installed. For example, if you installed the x64 package for Python, you need to install the x64 GStreamer package. The following instructions are for the x64 packages.
54-
55-
1. Create the folder c:\gstreamer.
56-
1. Download the [installer](https://gstreamer.freedesktop.org/data/pkg/windows/1.18.3/msvc/gstreamer-1.0-msvc-x86_64-1.18.3.msi).
57-
1. Copy the installer to c:\gstreamer.
58-
1. Open PowerShell as an administrator.
59-
1. Run the following command in PowerShell:
60-
61-
```powershell
62-
cd c:\gstreamer
63-
msiexec /passive INSTALLLEVEL=1000 INSTALLDIR=C:\gstreamer /i gstreamer-1.0-msvc-x86_64-1.18.3.msi
64-
```
65-
66-
1. Add the system variables GST_PLUGIN_PATH with the value C:\gstreamer\1.0\msvc_x86_64\lib\gstreamer-1.0.
67-
1. Add the system variables GSTREAMER_ROOT_X86_64 with the value C:\gstreamer\1.0\msvc_x86_64.
68-
1. Add another entry in the path variable as C:\gstreamer\1.0\msvc_x86_64\bin.
69-
1. Reboot the machine.
70-
71-
For more information about GStreamer, see [Windows installation instructions](https://gstreamer.freedesktop.org/documentation/installing/on-windows.html?gi-language=c).
72-
73-
***
74-
75-
## GStreamer configuration
76-
77-
> [!NOTE]
78-
> GStreamer configuration requirements vary by programming language. For more information, choose your programming language at the top of this page. The contents of this section will be updated.
18+
# How to use compressed input audio
7919

8020
::: zone pivot="programming-language-csharp"
81-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/csharp/prerequisites.md)]
21+
[!INCLUDE [C# include](includes/how-to/compressed-audio-input/csharp.md)]
8222
::: zone-end
8323

8424
::: zone pivot="programming-language-cpp"
85-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/cpp/prerequisites.md)]
25+
[!INCLUDE [C++ include](includes/how-to/compressed-audio-input/cpp.md)]
8626
::: zone-end
8727

88-
::: zone pivot="programming-language-java"
89-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/java/prerequisites.md)]
28+
::: zone pivot="programming-language-go"
29+
[!INCLUDE [Go include](includes/how-to/compressed-audio-input/go.md)]
9030
::: zone-end
9131

92-
::: zone pivot="programming-language-python"
93-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/python/prerequisites.md)]
32+
::: zone pivot="programming-language-java"
33+
[!INCLUDE [Java include](includes/how-to/compressed-audio-input/java.md)]
9434
::: zone-end
9535

96-
::: zone pivot="programming-language-go"
97-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/go/prerequisites.md)]
36+
::: zone pivot="programming-language-javascript"
37+
[!INCLUDE [JavaScript include](includes/how-to/compressed-audio-input/javascript.md)]
9838
::: zone-end
9939

100-
## Example
101-
102-
::: zone pivot="programming-language-csharp"
103-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/csharp/examples.md)]
40+
::: zone pivot="programming-language-objectivec"
41+
[!INCLUDE [ObjectiveC include](includes/how-to/compressed-audio-input/objectivec.md)]
10442
::: zone-end
10543

106-
::: zone pivot="programming-language-cpp"
107-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/cpp/examples.md)]
44+
::: zone pivot="programming-language-swift"
45+
[!INCLUDE [Swift include](includes/how-to/compressed-audio-input/swift.md)]
10846
::: zone-end
10947

110-
::: zone pivot="programming-language-java"
111-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/java/examples.md)]
48+
::: zone pivot="programming-language-python"
49+
[!INCLUDE [Python include](./includes/how-to/compressed-audio-input/python.md)]
11250
::: zone-end
11351

114-
::: zone pivot="programming-language-python"
115-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/python/examples.md)]
52+
::: zone pivot="programming-language-rest"
53+
[!INCLUDE [REST include](includes/how-to/compressed-audio-input/rest.md)]
11654
::: zone-end
11755

118-
::: zone pivot="programming-language-go"
119-
[!INCLUDE [prerequisites](includes/how-to/compressed-audio-input/go/examples.md)]
56+
::: zone pivot="programming-language-cli"
57+
[!INCLUDE [CLI include](includes/how-to/compressed-audio-input/cli.md)]
12058
::: zone-end
12159

12260
## Next steps
12361

124-
> [!div class="nextstepaction"]
125-
> [Learn how to recognize speech](./get-started-speech-to-text.md)
62+
* [Try the speech to text quickstart](get-started-speech-to-text.md)
63+
* [Improve recognition accuracy with custom speech](custom-speech-overview.md)
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.topic: include
5+
ms.date: 09/08/2020
6+
ms.author: eur
7+
---
8+
9+
[!INCLUDE [Introduction](intro.md)]
10+
11+
## GStreamer configuration
12+
13+
The Speech CLI can use [GStreamer](https://gstreamer.freedesktop.org) to handle compressed audio. For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech CLI. You need to install some dependencies and plug-ins.
14+
15+
GStreamer binaries must be in the system path so that they can be loaded by the Speech CLI at runtime. For example, on Windows, if the Speech CLI finds `libgstreamer-1.0-0.dll` or `gstreamer-1.0-0.dll` (for the latest GStreamer) during runtime, it means the GStreamer binaries are in the system path.
16+
17+
Choose a platform for installation instructions.
18+
19+
### [Linux](#tab/linux)
20+
21+
[!INCLUDE [Linux](gstreamer-linux.md)]
22+
23+
### [Windows](#tab/windows)
24+
25+
[!INCLUDE [Windows](gstreamer-windows.md)]
26+
27+
***
28+
29+
## Example
30+
31+
The `--format` option specifies the container format for the audio file being recognized. For an mp4 file, set the format to `any` as shown in the following command:
32+
33+
# [Terminal](#tab/terminal)
34+
35+
```console
36+
spx recognize --file YourAudioFile.mp4 --format any
37+
```
38+
39+
# [PowerShell](#tab/powershell)
40+
41+
```powershell
42+
spx --% recognize --file YourAudioFile.mp4 --format any
43+
```
44+
45+
***
46+
47+
To get a list of supported audio formats, run the following command:
48+
49+
# [Terminal](#tab/terminal)
50+
51+
```console
52+
spx help recognize format
53+
```
54+
55+
# [PowerShell](#tab/powershell)
56+
57+
```powershell
58+
spx help recognize format
59+
```
60+
61+
***
62+
Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,34 @@
22
author: eric-urban
33
ms.service: cognitive-services
44
ms.topic: include
5-
ms.date: 03/09/2020
5+
ms.date: 03/06/2020
66
ms.author: eur
77
---
88

9+
[!INCLUDE [Header](../../common/cpp.md)]
10+
11+
[!INCLUDE [Introduction](intro.md)]
12+
13+
## GStreamer configuration
14+
15+
The Speech SDK can use [GStreamer](https://gstreamer.freedesktop.org) to handle compressed audio. For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. You need to install some dependencies and plug-ins.
16+
17+
GStreamer binaries must be in the system path so that they can be loaded by the Speech SDK at runtime. For example, on Windows, if the Speech SDK finds `libgstreamer-1.0-0.dll` or `gstreamer-1.0-0.dll` (for the latest GStreamer) during runtime, it means the GStreamer binaries are in the system path.
18+
19+
Choose a platform for installation instructions.
20+
21+
### [Linux](#tab/linux)
22+
23+
[!INCLUDE [Linux](gstreamer-linux.md)]
24+
25+
### [Windows](#tab/windows)
26+
27+
[!INCLUDE [Windows](gstreamer-windows.md)]
28+
29+
***
30+
31+
## Example
32+
933
To configure the Speech SDK to accept compressed audio input, create `PullAudioInputStream` or `PushAudioInputStream`. Then, create an `AudioConfig` from an instance of your stream class that specifies the compression format of the stream. Find related sample code in [Speech SDK samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speaker_recognition_samples.cpp).
1034

1135
Let's assume that you have an input stream class called `pushStream` and are using OPUS/OGG. Your code might look like this:
Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,36 @@
22
author: eric-urban
33
ms.service: cognitive-services
44
ms.topic: include
5-
ms.date: 03/09/2020
5+
ms.date: 03/11/2020
66
ms.author: eur
77
ms.custom: devx-track-csharp
88
---
99

10-
To configure the Speech SDK to accept compressed audio input, create `PullAudioInputStream` or `PushAudioInputStream`. Then, create an `AudioConfig` from an instance of your stream class that specifies the compression format of the stream. Find related sample code snippets in [About the Speech SDK audio input stream API](../../../../how-to-use-audio-input-streams.md).
10+
[!INCLUDE [Header](../../common/csharp.md)]
11+
12+
[!INCLUDE [Introduction](intro.md)]
13+
14+
## GStreamer configuration
15+
16+
The Speech SDK can use [GStreamer](https://gstreamer.freedesktop.org) to handle compressed audio. For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. You need to install some dependencies and plug-ins.
17+
18+
GStreamer binaries must be in the system path so that they can be loaded by the Speech SDK at runtime. For example, on Windows, if the Speech SDK finds `libgstreamer-1.0-0.dll` or `gstreamer-1.0-0.dll` (for the latest GStreamer) during runtime, it means the GStreamer binaries are in the system path.
19+
20+
Choose a platform for installation instructions.
21+
22+
### [Linux](#tab/linux)
23+
24+
[!INCLUDE [Linux](gstreamer-linux.md)]
25+
26+
### [Windows](#tab/windows)
27+
28+
[!INCLUDE [Windows](gstreamer-windows.md)]
29+
30+
***
31+
32+
## Example
33+
34+
To configure the Speech SDK to accept compressed audio input, create `PullAudioInputStream` or `PushAudioInputStream`. Then, create an `AudioConfig` from an instance of your stream class that specifies the compression format of the stream. Find related sample code snippets in [About the Speech SDK audio input stream API](../../../how-to-use-audio-input-streams.md).
1135

1236
Let's assume that you have an input stream class called `pullStream` and are using OPUS/OGG. Your code might look like this:
1337

articles/cognitive-services/Speech-Service/includes/how-to/compressed-audio-input/csharp/prerequisites.md

Lines changed: 0 additions & 12 deletions
This file was deleted.
Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,23 @@
11
---
2-
author: amitkumarshukla
2+
author: eric-urban
33
ms.service: cognitive-services
44
ms.topic: include
5-
ms.date: 09/17/2021
6-
ms.author: amishu
5+
ms.date: 09/15/2020
6+
ms.author: eur
77
---
88

9+
[!INCLUDE [Header](../../common/go.md)]
10+
11+
[!INCLUDE [Introduction](intro.md)]
12+
13+
## GStreamer configuration
14+
15+
The Speech SDK can use [GStreamer](https://gstreamer.freedesktop.org) to handle compressed audio. For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. You need to install some dependencies and plug-ins.
16+
17+
[!INCLUDE [Linux](gstreamer-linux.md)]
18+
19+
## Example
20+
921
To configure the Speech SDK to accept compressed audio input, create a `PullAudioInputStream` or `PushAudioInputStream`. Then, create an `AudioConfig` from an instance of your stream class that specifies the compression format of the stream.
1022

1123
In the following example, let's assume that your use case is to use `PushStream` for a compressed file.

articles/cognitive-services/Speech-Service/includes/how-to/compressed-audio-input/go/prerequisites.md

Lines changed: 0 additions & 10 deletions
This file was deleted.
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
---
22
author: eric-urban
33
ms.service: cognitive-services
4+
ms.subservice: speech-service
45
ms.topic: include
5-
ms.date: 03/09/2020
6+
ms.date: 04/25/2022
67
ms.author: eur
78
---
89

10+
911
Handling compressed audio is implemented by using [GStreamer](https://gstreamer.freedesktop.org). For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. Instead, you need to use the prebuilt binaries for Android. To download the prebuilt libraries, see [Installing for Android development](https://gstreamer.freedesktop.org/documentation/installing/for-android-development.html?gi-language=c).
1012

1113
The `libgstreamer_android.so` object is required. Make sure that all the GStreamer plug-ins (from the Android.mk file that follows) are linked in `libgstreamer_android.so`. When you use the Speech SDK with GStreamer version 1.18.3, `libc++_shared.so` is also required to be present from android ndk.
Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
---
22
author: eric-urban
33
ms.service: cognitive-services
4+
ms.subservice: speech-service
45
ms.topic: include
5-
ms.date: 03/09/2020
6+
ms.date: 04/25/2022
67
ms.author: eur
78
---
89

9-
Handling compressed audio is implemented by using [GStreamer](https://gstreamer.freedesktop.org). For licensing reasons, GStreamer binaries aren't compiled and linked with the Speech SDK. You need to install several dependencies and plug-ins.
10+
You need to install several dependencies and plug-ins.
1011

1112
# [Ubuntu/Debian](#tab/debian)
1213

@@ -31,5 +32,6 @@ gstreamer1-plugins-ugly-free
3132
> [!NOTE]
3233
> On RHEL/CentOS 7 and RHEL/CentOS 8, in case of using "ANY" compressed format, more GStreamer plug-ins need to be installed if the stream media format plug-in isn't in the preceding installed plug-ins.
3334
34-
3535
---
36+
37+
For more information, see [Linux installation instructions](https://gstreamer.freedesktop.org/documentation/installing/on-linux.html?gi-language=c) and [supported Linux distributions and target architectures](~/articles/cognitive-services/speech-service/speech-sdk.md).

0 commit comments

Comments
 (0)