Skip to content

Commit 122b936

Browse files
authored
Merge pull request #234248 from eric-urban/eur/input-stream
audio input stream
2 parents 19360d5 + 1a9c3f0 commit 122b936

File tree

1 file changed

+51
-44
lines changed

1 file changed

+51
-44
lines changed

articles/cognitive-services/Speech-Service/how-to-use-audio-input-streams.md

Lines changed: 51 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: Speech SDK audio input stream concepts
33
titleSuffix: Azure Cognitive Services
44
description: An overview of the capabilities of the Speech SDK audio input stream API.
55
services: cognitive-services
6-
author: fmegen
6+
author: eric-urban
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: how-to
11-
ms.date: 06/13/2022
12-
ms.author: fmegen
11+
ms.date: 04/12/2023
12+
ms.author: eur
1313
ms.devlang: csharp
1414
ms.custom: devx-track-csharp
1515
---
@@ -18,64 +18,71 @@ ms.custom: devx-track-csharp
1818

1919
The Speech SDK provides a way to stream audio into the recognizer as an alternative to microphone or file input.
2020

21-
The following steps are required when you use audio input streams:
21+
This guide describes how to use audio input streams. It also describes some of the requirements and limitations of the audio input stream.
2222

23-
- Identify the format of the audio stream. The format must be supported by the Speech SDK and the Azure Cognitive Services Speech service. Currently, only the following configuration is supported:
23+
See more examples of speech-to-text recognition with audio input stream on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs).
2424

25-
Audio samples are:
25+
## Identify the format of the audio stream
2626

27-
- PCM format (int-16)
28-
- One channel
29-
- 16 bits per sample, 8,000 or 16,000 samples per second (16,000 bytes or 32,000 bytes per second)
30-
- Two-block aligned (16 bit including padding for a sample)
27+
Identify the format of the audio stream. The format must be supported by the Speech SDK and the Azure Cognitive Services Speech service.
3128

32-
The corresponding code in the SDK to create the audio format looks like this example:
29+
Supported audio samples are:
3330

34-
```csharp
35-
byte channels = 1;
36-
byte bitsPerSample = 16;
37-
int samplesPerSecond = 16000; // or 8000
38-
var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
39-
```
31+
- PCM format (int-16)
32+
- One channel
33+
- 16 bits per sample, 8,000 or 16,000 samples per second (16,000 bytes or 32,000 bytes per second)
34+
- Two-block aligned (16 bit including padding for a sample)
4035

41-
- Make sure that your code provides the RAW audio data according to these specifications. Also, make sure that 16-bit samples arrive in little-endian format. Signed samples are also supported. If your audio source data doesn't match the supported formats, the audio must be transcoded into the required format.
36+
The corresponding code in the SDK to create the audio format looks like this example:
4237

43-
- Create your own audio input stream class derived from `PullAudioInputStreamCallback`. Implement the `Read()` and `Close()` members. The exact function signature is language-dependent, but the code looks similar to this code sample:
38+
```csharp
39+
byte channels = 1;
40+
byte bitsPerSample = 16;
41+
int samplesPerSecond = 16000; // or 8000
42+
var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
43+
```
4444

45-
```csharp
46-
public class ContosoAudioStream : PullAudioInputStreamCallback {
47-
ContosoConfig config;
45+
Make sure that your code provides the RAW audio data according to these specifications. Also, make sure that 16-bit samples arrive in little-endian format. Signed samples are also supported. If your audio source data doesn't match the supported formats, the audio must be transcoded into the required format.
4846

49-
public ContosoAudioStream(const ContosoConfig& config) {
50-
this.config = config;
51-
}
47+
## Create your own audio input stream class
5248

53-
public int Read(byte[] buffer, uint size) {
54-
// Returns audio data to the caller.
55-
// E.g., return read(config.YYY, buffer, size);
56-
}
49+
You can create your own audio input stream class derived from `PullAudioInputStreamCallback`. Implement the `Read()` and `Close()` members. The exact function signature is language-dependent, but the code looks similar to this code sample:
5750

58-
public void Close() {
59-
// Close and clean up resources.
60-
}
61-
};
62-
```
51+
```csharp
52+
public class ContosoAudioStream : PullAudioInputStreamCallback {
53+
ContosoConfig config;
6354

64-
- Create an audio configuration based on your audio format and input stream. Pass in both your regular speech configuration and the audio input configuration when you create your recognizer. For example:
55+
public ContosoAudioStream(const ContosoConfig& config) {
56+
this.config = config;
57+
}
6558

66-
```csharp
67-
var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(config), audioFormat);
59+
public int Read(byte[] buffer, uint size) {
60+
// Returns audio data to the caller.
61+
// E.g., return read(config.YYY, buffer, size);
62+
}
6863

69-
var speechConfig = SpeechConfig.FromSubscription(...);
70-
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
64+
public void Close() {
65+
// Close and clean up resources.
66+
}
67+
};
68+
```
7169

72-
// Run stream through recognizer.
73-
var result = await recognizer.RecognizeOnceAsync();
70+
Create an audio configuration based on your audio format and input stream. Pass in both your regular speech configuration and the audio input configuration when you create your recognizer. For example:
71+
72+
```csharp
73+
var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(config), audioFormat);
74+
75+
var speechConfig = SpeechConfig.FromSubscription(...);
76+
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
77+
78+
// Run stream through recognizer.
79+
var result = await recognizer.RecognizeOnceAsync();
80+
81+
var text = result.GetText();
82+
```
7483

75-
var text = result.GetText();
76-
```
7784

7885
## Next steps
7986

8087
- [Create a free Azure account](https://azure.microsoft.com/free/cognitive-services/)
81-
- [See how to recognize speech in C#](./get-started-speech-to-text.md?pivots=programming-language-csharp&tabs=dotnet)
88+
- [See how to recognize speech in C#](./get-started-speech-to-text.md?pivots=programming-language-csharp&tabs=dotnet)

0 commit comments

Comments
 (0)