You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: An overview of the capabilities of the Speech SDK audio input stream API.
5
5
services: cognitive-services
6
-
author: fmegen
6
+
author: eric-urban
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: how-to
11
-
ms.date: 06/13/2022
12
-
ms.author: fmegen
11
+
ms.date: 04/12/2023
12
+
ms.author: eur
13
13
ms.devlang: csharp
14
14
ms.custom: devx-track-csharp
15
15
---
@@ -18,64 +18,71 @@ ms.custom: devx-track-csharp
18
18
19
19
The Speech SDK provides a way to stream audio into the recognizer as an alternative to microphone or file input.
20
20
21
-
The following steps are required when you use audio input streams:
21
+
This guide describes how to use audio input streams. It also describes some of the requirements and limitations of the audio input stream.
22
22
23
-
- Identify the format of the audio stream. The format must be supported by the Speech SDK and the Azure Cognitive Services Speech service. Currently, only the following configuration is supported:
23
+
See more examples of speech-to-text recognition with audio input stream on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs).
24
24
25
-
Audio samples are:
25
+
## Identify the format of the audio stream
26
26
27
-
- PCM format (int-16)
28
-
- One channel
29
-
- 16 bits per sample, 8,000 or 16,000 samples per second (16,000 bytes or 32,000 bytes per second)
30
-
- Two-block aligned (16 bit including padding for a sample)
27
+
Identify the format of the audio stream. The format must be supported by the Speech SDK and the Azure Cognitive Services Speech service.
31
28
32
-
The corresponding code in the SDK to create the audio format looks like this example:
- 16 bits per sample, 8,000 or 16,000 samples per second (16,000 bytes or 32,000 bytes per second)
34
+
- Two-block aligned (16 bit including padding for a sample)
40
35
41
-
- Make sure that your code provides the RAW audio data according to these specifications. Also, make sure that 16-bit samples arrive in little-endian format. Signed samples are also supported. If your audio source data doesn't match the supported formats, the audio must be transcoded into the required format.
36
+
The corresponding code in the SDK to create the audio format looks like this example:
42
37
43
-
- Create your own audio input stream class derived from `PullAudioInputStreamCallback`. Implement the `Read()` and `Close()` members. The exact function signature is language-dependent, but the code looks similar to this code sample:
Make sure that your code provides the RAW audio data according to these specifications. Also, make sure that 16-bit samples arrive in little-endian format. Signed samples are also supported. If your audio source data doesn't match the supported formats, the audio must be transcoded into the required format.
You can create your own audio input stream class derived from `PullAudioInputStreamCallback`. Implement the `Read()` and `Close()` members. The exact function signature is language-dependent, but the code looks similar to this code sample:
- Create an audio configuration based on your audio format and input stream. Pass in both your regular speech configuration and the audio input configuration when you create your recognizer. For example:
Create an audio configuration based on your audio format and input stream. Pass in both your regular speech configuration and the audio input configuration when you create your recognizer. For example:
0 commit comments