Skip to content

Commit 54951e1

Browse files
authored
Merge pull request #291054 from valindrae/bidirectional-streaming-preview
Bidirectional streaming preview
2 parents 8d0e556 + 0a704fa commit 54951e1

File tree

3 files changed

+163
-121
lines changed

3 files changed

+163
-121
lines changed

articles/communication-services/concepts/call-automation/audio-streaming-concept.md

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Conceptual information about using Audio Streaming APIs with Call A
55
author: Alvin
66
ms.service: azure-communication-services
77
ms.topic: overview
8-
ms.date: 07/17/2024
8+
ms.date: 11/24/2024
99
ms.author: alvinhan
1010
ms.custom: public_prview
1111
---
@@ -14,43 +14,48 @@ ms.custom: public_prview
1414

1515
[!INCLUDE [Public Preview Disclaimer](../../includes/public-preview-include-document.md)]
1616

17-
Azure Communication Services provides developers with Audio Streaming capabilities to get real-time access to audio streams to capture, analyze, and process audio content during active calls. In today's world consumption of live audio and video is prevalent, this content could be in the forms of online meetings, online conferences, customer support, etc. With audio streaming access, developers can now build server applications to capture and analyze audio streams for each of the participants on the call in real-time. Developers can also combine audio streaming with other call automation actions or use their own AI models to analyze audio streams. Use cases include NLP for conversation analysis or providing real-time insights and suggestions to agents while they are in an active interaction with end users.
17+
Azure Communication Services provides bidirectional audio streaming capabilities, offering developers powerful tools to capture, analyze, and process audio content during active calls. This development paves the way for new possibilities in real-time communication for developers and businesses alike.
1818

19-
This public preview supports the ability for developers to get access to real-time audio streams over a WebSocket to analyze the call's audio in mixed and unmixed formats.
19+
By integrating bidirectional audio streaming with services like Azure OpenAI and other real-time voice APIs, businesses can achieve seamless, low-latency communication. This significantly enhances the development and deployment of conversational AI solutions, allowing for more engaging and efficient interactions.
2020

21-
## Common use cases
22-
Audio streams can be used in many ways. Some examples of how developers may wish to use the audio streams in their applications include:
21+
With bidirectional streaming, businesses can now elevate their voice solutions to low-latency, human-like, interactive conversational AI agents. Our bidirectional streaming APIs enable developers to stream audio from an ongoing call on Azure Communication Services to their web servers in real-time, and stream audio back into the call. While the initial focus of these features is to help businesses create conversational AI agents, other use cases include Natural Language Processing for conversation analysis or providing real-time insights and suggestions to agents while they are in active interaction with end users.
22+
23+
This public preview supports the ability for developers to access real-time audio streams over a WebSocket from Azure Communication Services and stream audio back into the call.
2324

2425
### Real-time call assistance
2526

26-
**Improved AI powered suggestions** - Use real-time audio streams of active interactions between agents and customers to gauge the intent of the call and how your agents can provide a better experience to their customer through active suggestions using your own AI model to analyze the call.
27+
- **Leverage conversational AI Solutions:** Develop sophisticated customer support virtual agents that can interact with customers in real-time, providing immediate responses and solutions.
28+
29+
- **Personalized customer experiences:** By harnessing real-time data, businesses can offer more personalized and dynamic customer interactions in real-time, leading to increased satisfaction and loyalty.
30+
31+
- **Reduce wait times for customers:** Using bidirectional audio streams with Large Language Models (LLMs), you can create virtual agents that serve as the first point of contact for customers, reducing their wait time for a human agent.
2732

2833
### Authentication
2934

30-
**Biometric authentication** – Use the audio streams to carry out voice authentication, by running the audio from the call through your voice recognition/matching engine/tool.
35+
- **Biometric authentication** – Use the audio streams to carry out voice authentication, by running the audio from the call through your voice recognition/matching engine/tool.
3136

32-
## Sample architecture for subscribing to audio streams from an ongoing call - live agent scenario
37+
## Sample architecture showing how bidirectional audio streaming can be used for conversational AI agents
3338

34-
[![Screenshot of architecture diagram for audio streaming.](./media/audio-streaming-diagram.png)](./media/audio-streaming-diagram.png#lightbox)
39+
[![Screenshot of architecture diagram for audio streaming.](./media/bidirectional-streaming.png)](./media/bidirectional-streaming.png#lightbox)
3540

3641
## Supported formats
3742

38-
### Mixed format
39-
Contains mixed audio of all participants on the call. All audio is flattened into one stream.
43+
### Mixed
44+
Contains mixed audio of all participants on the call. All audio is flattened into one stream.
4045

4146
### Unmixed
42-
Contains audio per participant per channel, with support for up to four channels for the four most dominant speakers at any point in a call. You'll also get a participantRawID that you can use to determine the speaker.
47+
Contains audio per participant per channel, with support for up to four channels for the four most dominant speakers at any point in a call. You also get a participantRawID that you can use to determine the speaker.
4348

4449
## Additional information
45-
The table below describes information that will help developers convert the audio packets into audible content that can be used by their applications.
50+
Developers can use the following information about audio sent from Azure Communication Services to convert the audio packets into audible content for their applications.
4651
- Framerate: 50 frames per second
47-
- Packet stream rate: 20 ms rate
48-
- Data packet: 64 Kbytes
49-
- Audio metric: 16-bit PCM mono at 16000 hz
50-
- Public string data is a base64 string that should be converted into a byte array to create raw PCM file.\
52+
- Packet stream rate: 20-ms rate
53+
- Data packet size: 640 bytes for 16,000 hz and 960 bytes for 24,000 hz
54+
- Audio metric: 16-bit PCM mono at 16,000 hz and 24,000 hz
55+
- Public string data is a base64 string that should be converted into a byte array to create raw PCM file.
5156

5257
## Billing
53-
See the [Azure Communication Services pricing page](https://azure.microsoft.com/pricing/details/communication-services/?msockid=3b3359f3828f6cfe30994a9483c76d50) for information on how audio streaming is billed. Prices can be found in the calling category under audio streaming.
58+
See the [Azure Communication Services pricing page](https://azure.microsoft.com/pricing/details/communication-services/?msockid=3b3359f3828f6cfe30994a9483c76d50) for information on how audio streaming is billed. Prices can be found in the calling category under audio streaming.
5459

5560
## Next Steps
5661
Check out the [audio streaming quickstart](../../how-tos/call-automation/audio-streaming-quickstart.md) to learn more.
144 KB
Loading
Lines changed: 140 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: Include file - C#
3-
description: C# Audio Streaming quickstart
3+
description: C# Bidirectional audio streaming how-to
44
services: azure-communication-services
55
author: Alvin
66
ms.service: azure-communication-services
77
ms.subservice: call-automation
8-
ms.date: 07/15/2024
8+
ms.date: 11/24/2024
99
ms.topic: include
1010
ms.topic: Include file
1111
ms.author: alvinhan
@@ -16,124 +16,161 @@ ms.author: alvinhan
1616
- An Azure Communication Services resource. See [Create an Azure Communication Services resource](../../../quickstarts/create-communication-resource.md?tabs=windows&pivots=platform-azp).
1717
- A new web service application created using the [Call Automation SDK](../../../quickstarts/call-automation/callflows-for-customer-interactions.md).
1818
- The latest [.NET library](https://dotnet.microsoft.com/download/dotnet-core) for your operating system.
19-
- A websocket server that can receive media streams.
19+
- A websocket server that can send and receive media streams.
2020

2121
## Set up a websocket server
2222
Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection.
23-
You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this [quickstart](https://azure.microsoft.com/blog/introduction-to-websockets-on-windows-azure-web-sites/).
2423

25-
## Establish a call
26-
Establish a call and provide streaming details
24+
You can review documentation [here](https://azure.microsoft.com/blog/introduction-to-websockets-on-windows-azure-web-sites/) to learn more about WebSockets and how to use them.
25+
26+
## Receiving and sending audio streaming data
27+
There are multiple ways to start receiving audio stream, which can be configured using the `startMediaStreaming` flag in the `mediaStreamingOptions` setup. You can also specify the desired sample rate used for receiving or sending audio data using the `audioFormat` parameter. Currently supported formats are PCM 24K mono and PCM 16K mono, with the default being PCM 16K mono.
28+
29+
To enable bidirectional audio streaming, where you're sending audio data into the call, you can enable the `EnableBidirectional` flag. For more details, refer to the [API specifications](https://learn.microsoft.com/rest/api/communication/callautomation/answer-call/answer-call?view=rest-communication-callautomation-2024-06-15-preview&tabs=HTTP#mediastreamingoptions).
30+
31+
### Start streaming audio to your webserver at time of answering the call
32+
Enable automatic audio streaming when the call is established by setting the flag `startMediaStreaming: true`.
33+
34+
This setting ensures that audio streaming starts automatically as soon as the call is connected.
2735

2836
``` C#
29-
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(
30-
new Uri("<WEBSOCKET URL>"),
31-
MediaStreamingContent.Audio,
32-
MediaStreamingAudioChannel.Mixed,
33-
MediaStreamingTransport.Websocket,
34-
false);
35-
36-
var createCallOptions = new CreateCallOptions(callInvite, callbackUri)
37-
{
38-
CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri(cognitiveServiceEndpoint) },
39-
MediaStreamingOptions = mediaStreamingOptions,
40-
};
41-
42-
CreateCallResult createCallResult = await callAutomationClient.CreateCallAsync(createCallOptions);
37+
var mediaStreamingOptions = new MediaStreamingOptions(
38+
new Uri("wss://YOUR_WEBSOCKET_URL"),
39+
MediaStreamingContent.Audio,
40+
MediaStreamingAudioChannel.Mixed,
41+
startMediaStreaming: true) {
42+
EnableBidirectional = true,
43+
AudioFormat = AudioFormat.Pcm24KMono
44+
}
45+
var options = new AnswerCallOptions(incomingCallContext, callbackUri) {
46+
MediaStreamingOptions = mediaStreamingOptions,
47+
};
48+
49+
AnswerCallResult answerCallResult = await client.AnswerCallAsync(options);
4350
```
4451

45-
## Start audio streaming
46-
How to start audio streaming:
52+
When Azure Communication Services receives the URL for your WebSocket server, it establishes a connection to it. Once the connection is successfully made, streaming is initiated.
53+
54+
55+
### Start streaming audio to your webserver while a call is in progress
56+
To start media streaming during the call, you can use the API. To do so, set the `startMediaStreaming` parameter to `false` (which is the default), and later in the call, you can use the start API to enable media streaming.
57+
4758
``` C#
48-
StartMediaStreamingOptions options = new StartMediaStreamingOptions()
49-
{
50-
OperationCallbackUri = new Uri(callbackUriHost),
51-
OperationContext = "startMediaStreamingContext"
52-
};
53-
await callMedia.StartMediaStreamingAsync(options);
54-
```
55-
When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.
56-
57-
The metadata packet will look like this:
58-
``` code
59-
{
60-
"kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData.
61-
"audioMetadata": {
62-
"subscriptionId": <string>, // unique identifier for a subscription request
63-
"encoding":<string>, // PCM only supported
64-
"sampleRate": <int>, // 16000 default
65-
"channels": <int>, // 1 default
66-
"length": <int> // 640 default
67-
}
68-
}
59+
var mediaStreamingOptions = new MediaStreamingOptions(
60+
new Uri("wss://<YOUR_WEBSOCKET_URL"),
61+
MediaStreamingContent.Audio,
62+
MediaStreamingAudioChannel.Mixed,
63+
startMediaStreaming: false) {
64+
EnableBidirectional = true,
65+
AudioFormat = AudioFormat.Pcm24KMono
66+
}
67+
var options = new AnswerCallOptions(incomingCallContext, callbackUri) {
68+
MediaStreamingOptions = mediaStreamingOptions,
69+
};
70+
71+
AnswerCallResult answerCallResult = await client.AnswerCallAsync(options);
72+
73+
Start media streaming via API call
74+
StartMediaStreamingOptions options = new StartMediaStreamingOptions() {
75+
OperationContext = "startMediaStreamingContext"
76+
};
77+
78+
await callMedia.StartMediaStreamingAsync();
6979
```
7080

7181

7282
## Stop audio streaming
73-
How to stop audio streaming
83+
To stop receiving audio streams during a call, you can use the **Stop streaming API**. This allows you to stop the audio streaming at any point in the call. There are two ways that audio streaming can be stopped;
84+
- **Triggering the Stop streaming API:** Use the API to stop receiving audio streaming data while the call is still active.
85+
- **Automatic stop on call disconnect:** Audio streaming automatically stops when the call is disconnected.
86+
7487
``` C#
75-
StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions()
76-
{
77-
OperationCallbackUri = new Uri(callbackUriHost)
78-
};
79-
await callMedia.StopMediaStreamingAsync(stopOptions);
88+
StopMediaStreamingOptions options = new StopMediaStreamingOptions() {
89+
OperationContext = "stopMediaStreamingContext"
90+
};
91+
92+
await callMedia.StopMediaStreamingAsync();
8093
```
8194

8295
## Handling audio streams in your websocket server
83-
The sample below demonstrates how to listen to audio streams using your websocket server.
96+
This sample demonstrates how to listen to audio streams using your websocket server.
97+
98+
``` C#
99+
private async Task StartReceivingFromAcsMediaWebSocket(Websocket websocket) {
100+
101+
while (webSocket.State == WebSocketState.Open || webSocket.State == WebSocketState.Closed) {
102+
byte[] receiveBuffer = new byte[2048];
103+
WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(
104+
new ArraySegment < byte > (receiveBuffer));
105+
106+
if (receiveResult.MessageType != WebSocketMessageType.Close) {
107+
string data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0');
108+
var input = StreamingData.Parse(data);
109+
if (input is AudioData audioData) {
110+
// Add your code here to process the received audio chunk
111+
}
112+
}
113+
}
114+
}
115+
```
116+
117+
The first packet you receive contains metadata about the stream, including audio settings such as encoding, sample rate, and other configuration details.
118+
119+
``` json
120+
{
121+
"kind": "AudioMetadata",
122+
"audioMetadata": {
123+
"subscriptionId": "89e8cb59-b991-48b0-b154-1db84f16a077",
124+
"encoding": "PCM",
125+
"sampleRate": 16000,
126+
"channels": 1,
127+
"length": 640
128+
}
129+
}
130+
```
131+
132+
After sending the metadata packet, Azure Communication Services (ACS) will begin streaming audio media to your WebSocket server.
133+
134+
``` json
135+
{
136+
"kind": "AudioData",
137+
"audioData": {
138+
"timestamp": "2024-11-15T19:16:12.925Z",
139+
"participantRawID": "8:acs:3d20e1de-0f28-41c5…",
140+
"data": "5ADwAOMA6AD0A…",
141+
"silent": false
142+
}
143+
}
144+
```
145+
146+
## Sending audio streaming data to Azure Communication Services
147+
If bidirectional streaming is enabled using the `EnableBidirectional` flag in the `MediaStreamingOptions`, you can stream audio data back to Azure Communication Services, which plays the audio into the call.
148+
149+
Once Azure Communication Services begins streaming audio to your WebSocket server, you can relay the audio to your AI services. After your AI service processes the audio content, you can stream the audio back to the ongoing call in Azure Communication Services.
150+
151+
The example demonstrates how another service, such as Azure OpenAI or other voice-based Large Language Models, processes and transmits the audio data back into the call.
84152

85153
``` C#
86-
HttpListener httpListener = new HttpListener();
87-
httpListener.Prefixes.Add("http://localhost:80/");
88-
httpListener.Start();
89-
90-
while (true)
91-
{
92-
HttpListenerContext httpListenerContext = await httpListener.GetContextAsync();
93-
if (httpListenerContext.Request.IsWebSocketRequest)
94-
{
95-
WebSocketContext websocketContext;
96-
try
97-
{
98-
websocketContext = await httpListenerContext.AcceptWebSocketAsync(subProtocol: null);
99-
}
100-
catch (Exception ex)
101-
{
102-
return;
103-
}
104-
WebSocket webSocket = websocketContext.WebSocket;
105-
try
106-
{
107-
while (webSocket.State == WebSocketState.Open || webSocket.State == WebSocketState.CloseSent)
108-
{
109-
byte[] receiveBuffer = new byte[2048];
110-
var cancellationToken = new CancellationTokenSource(TimeSpan.FromSeconds(60)).Token;
111-
WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), cancellationToken);
112-
if (receiveResult.MessageType != WebSocketMessageType.Close)
113-
{
114-
var data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0');
115-
try
116-
{
117-
var eventData = JsonConvert.DeserializeObject<AudioBaseClass>(data);
118-
if (eventData != null)
119-
{
120-
if(eventData.kind == "AudioMetadata")
121-
{
122-
//Process audio metadata
123-
}
124-
else if(eventData.kind == "AudioData")
125-
{
126-
//Process audio data
127-
var byteArray = eventData.audioData.data;
128-
//use audio byteArray as you want
129-
}
130-
}
131-
}
132-
catch { }
133-
}
134-
}
135-
}
136-
catch (Exception ex) { }
137-
}
138-
}
154+
var audioData = OutStreamingData.GetAudioDataForOutbound(audioData)),
155+
byte[] jsonBytes = Encoding.UTF8.GetBytes(audioData);
156+
157+
// Write your logic to send the PCM audio chunk over the WebSocket
158+
// Example of how to send audio data over the WebSocket
159+
await m_webSocket.SendAsync(new ArraySegment < byte > (jsonBytes), WebSocketMessageType.Text, endOfMessage: true, CancellationToken.None);
139160
```
161+
162+
You can also control the playback of audio in the call when streaming back to Azure Communication Services, based on your logic or business flow. For example, when voice activity is detected and you want to stop the queued up audio, you can send a stop message via the WebSocket to stop the audio from playing in the call.
163+
164+
``` C#
165+
var stopData = OutStreamingData.GetStopAudioForOutbound();
166+
byte[] jsonBytes = Encoding.UTF8.GetBytes(stopData);
167+
168+
// Write your logic to send stop data to ACS over the WebSocket
169+
// Example of how to send stop data over the WebSocket
170+
await m_webSocket.SendAsync(new ArraySegment < byte > (jsonBytes), WebSocketMessageType.Text, endOfMessage: true, CancellationToken.None);
171+
```
172+
173+
174+
175+
176+

0 commit comments

Comments
 (0)