Skip to content

Commit 8bb1ae6

Browse files
authored
Merge pull request #300802 from valindrae/streaming-ga
Streaming ga updates
2 parents 48b74a5 + b34d30f commit 8bb1ae6

17 files changed

+419
-79
lines changed

articles/communication-services/concepts/call-automation/audio-streaming-concept.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,9 @@ ms.custom: public_prview
1212

1313
# Audio streaming overview - audio subscription
1414

15-
[!INCLUDE [Public Preview Disclaimer](../../includes/public-preview-include-document.md)]
16-
1715
Azure Communication Services provides bidirectional audio streaming capabilities, offering developers powerful tools to capture, analyze, and process audio content during active calls. This development paves the way for new possibilities in real-time communication for developers and businesses alike.
1816

19-
By integrating bidirectional audio streaming with services like Azure OpenAI and other real-time voice APIs, businesses can achieve seamless, low-latency communication. This significantly enhances the development and deployment of conversational AI solutions, allowing for more engaging and efficient interactions.
17+
By integrating bidirectional audio streaming with services like Azure OpenAI and other real-time voice APIs, businesses can achieve seamless, low-latency communication. This additional capability significantly enhances the development and deployment of conversational AI solutions, allowing for more engaging and efficient interactions.
2018

2119
With bidirectional streaming, businesses can now elevate their voice solutions to low-latency, human-like, interactive conversational AI agents. Our bidirectional streaming APIs enable developers to stream audio from an ongoing call on Azure Communication Services to their web servers in real-time, and stream audio back into the call. While the initial focus of these features is to help businesses create conversational AI agents, other use cases include Natural Language Processing for conversation analysis or providing real-time insights and suggestions to agents while they are in active interaction with end users.
2220

@@ -57,5 +55,15 @@ Developers can use the following information about audio sent from Azure Communi
5755
## Billing
5856
See the [Azure Communication Services pricing page](https://azure.microsoft.com/pricing/details/communication-services/?msockid=3b3359f3828f6cfe30994a9483c76d50) for information on how audio streaming is billed. Prices can be found in the calling category under audio streaming.
5957

58+
## Known Limitations
59+
- Stopping media streaming using a new operationContext doesn't correctly reflect the updated context.
60+
- If you create or answer a call with operationContext set to "ABC" and enable media streaming, you receive the MediaStreamingStarted event with operationContext: "ABC."
61+
- If you call the StopStreaming API with a different operationContext, say "XYZ," you would expect to receive the MediaStreamingStopped event with operationContext: "XYZ". However, due to a known issue, the MediaStreamingStopped event still contains operationContext: "ABC."
62+
- When stopping media streaming using a new callback URI, events continue to be sent to the default callback URI used during call creation or answer.
63+
- If you create or answer a call with a default callback URI "https://ABC.com" and enable media streaming, the MediaStreamingStarted event will be sent to "https://ABC.com".
64+
- If you then stop streaming using the StopStreaming API and specify a new callback URI "https://XYZ.com," you would expect the MediaStreamingStopped event to be sent to "https://XYZ.com." However, due to a known issue, the event is still sent to the original callback URI "https://ABC.com"
65+
66+
67+
6068
## Next Steps
61-
Check out the [audio streaming quickstart](../../how-tos/call-automation/audio-streaming-quickstart.md) to learn more.
69+
To learn more check out the [audio streaming quickstart](../../how-tos/call-automation/audio-streaming-quickstart.md).

articles/communication-services/concepts/call-automation/real-time-transcription.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,10 @@ services: azure-communication-services
1212
---
1313

1414
# Generating real-time transcripts
15-
[!INCLUDE [Public Preview Disclaimer](../../includes/public-preview-include-document.md)]
1615

1716
Real-time transcriptions are a crucial component in any major business for driving improved customer service experience. Powered by Azure Communication Services and Azure AI Services integration, developers can now use real-time transcriptions through Call Automation SDKs.
1817

19-
Using the Azure Communication Services real-time transcription, you can easily integrate your Azure AI Services resource with Azure Communication Services to generate transcripts directly during the call. This eliminates the need for developers to extract audio content and deal with the overhead of converting audio into text on your side. You can store the contents of this transcript to use later on for creating a history of the call, summarizing the call to save an agent's time, and even feeding it into your training/learning modules to help improve your contact center agents' customer interactions.
18+
Using the Azure Communication Services real-time transcription, you can easily integrate your Azure AI Services resource with Azure Communication Services to generate transcripts directly during the call. This capability eliminates the need for developers to extract audio content and deal with the overhead of converting audio into text on your side. You can store the contents of this transcript to use later on for creating a history of the call, summarizing the call to save an agent's time, and even feeding it into your training/learning modules to help improve your contact center agents' customer interactions.
2019

2120
Out of the box Microsoft utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. This model is pretrained with dialects and phonetics representing various common domains. For more information about supported languages, see [Languages and voice support for the Speech service](/azure/ai-services/speech-service/language-support).
2221

@@ -29,16 +28,21 @@ Assist agents better understand customer needs and respond more quickly and accu
2928
Help agents focus on the conversation rather than note-taking, allowing them to handle more calls and improve productivity
3029

3130
### Context for agents
32-
Provide context to an agent before the agent picks up the call, this way the agent knows the information that the caller has provided avoiding any need for the caller to repeat their issue.
31+
Provide context to an agent before the agent picks up the call. This way, the agent knows the information the caller already gave and the caller does not need to repeat their issue.
3332

3433
### Derive insights
35-
Using the transcript generated throughout the call, you can leverage other AI tools to gain live, real-time insights that will help agents and supervisors improve their interactions with customers.
34+
Using the transcript generated throughout the call, you can use other AI tools to gain live, real-time insights that help agents and supervisors improve their interactions with customers.
3635

3736
## Sample flow of real-time transcription using Call Automation
3837
![Diagram of real-time transcription flow.](./media/transcription.png)
3938

4039
## Billing
41-
See the [Azure Communication Services pricing page](https://azure.microsoft.com/pricing/details/communication-services/?msockid=3b3359f3828f6cfe30994a9483c76d50) for information on how real-time transcription is billed. Prices can be found in the calling category under audio streaming -> unmixed audio insights streaming.
40+
See the [Azure Communication Services pricing page](https://azure.microsoft.com/pricing/details/communication-services/?msockid=3b3359f3828f6cfe30994a9483c76d50) for information on how real-time transcription is billed. Prices can be found in the calling category under audio streaming -> unmixed audio insights streaming.
41+
42+
## Known limitations
43+
- Updating transcription with a new operationContext also fails to reflect the updated context.
44+
- When you create or answer a call with operationContext: "ABC" and enable transcription, you receive the TranscriptionStarted event with operationContext: "ABC".
45+
- If you call the UpdateTranscription API with a new operationContext: "XYZ," you would expect the TranscriptionUpdated event to include operationContext: "XYZ". However, due to a known issue, the TranscriptionUpdated event still returns operationContext: "ABC".
4246

4347
## Next Steps
4448
- Check out our how-to guide to learn [how-to use our Real-time Transcription](../../how-tos/call-automation/real-time-transcription-tutorial.md) to users.

articles/communication-services/concepts/voice-video-calling/network-requirements.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ Communication Services connections require internet connectivity to specific por
6161
| :-- | :-- | :-- |
6262
| Media traffic | Range of Azure public cloud IP addresses 20.202.0.0/16 The range provided above is the range of IP addresses on either Media processor or Azure Communication Services TURN service. | UDP 3478 through 3481, TCP ports 443 |
6363
| Signaling, telemetry, registration| *.skype.com, *.microsoft.com, *.azure.net, *.azure.com, *.office.com| TCP 443, 80 |
64+
| Call Automation Media | 52.112.0.0/14, 52.122.0.0/15, 2603:1063::/38| UDP: 3478, 3479, 3480, 3481|
65+
| Call Automation callback URLs | *.lync.com, *.teams.cloud.microsoft, *.teams.microsoft.com, teams.cloud.microsoft, teams.microsoft.com, 52.112.0.0/14, 52.122.0.0/15, 2603:1027::/48, 2603:1037::/48, 2603:1047::/48, 2603:1057::/48, 2603:1063::/38, 2620:1ec:6::/48, 2620:1ec:40::/42 | TCP: 443, 80 UDP: 443 |
6466

6567

6668
The endpoints below should be reachable for U.S. Government GCC High customers only.

articles/communication-services/how-tos/call-automation/audio-streaming-quickstart.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ zone_pivot_groups: acs-js-csharp-java-python
1414

1515
# Quickstart: Server-side Audio Streaming
1616

17-
[!INCLUDE [Public Preview Disclaimer](../../includes/public-preview-include-document.md)]
18-
1917
Get started with using audio streams through Azure Communication Services Audio Streaming API. This quickstart assumes you're already familiar with Call Automation APIs to build an automated call routing solution.
2018

2119
Functionality described in this quickstart is currently in public preview.

articles/communication-services/how-tos/call-automation/includes/audio-streaming-quickstart-csharp.md

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,12 @@ Enable automatic audio streaming when the call is established by setting the fla
3434
This setting ensures that audio streaming starts automatically as soon as the call is connected.
3535

3636
``` C#
37-
var mediaStreamingOptions = new MediaStreamingOptions(
38-
new Uri("wss://YOUR_WEBSOCKET_URL"),
39-
MediaStreamingContent.Audio,
40-
MediaStreamingAudioChannel.Mixed,
41-
startMediaStreaming: true) {
42-
EnableBidirectional = true,
43-
AudioFormat = AudioFormat.Pcm24KMono
44-
}
37+
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(MediaStreamingAudioChannel.Unmixed);
38+
mediaStreamingOptions.TransportUri = new Uri(websocketUri);
39+
mediaStreamingOptions.EnableBidirectional = true;
40+
mediaStreamingOptions.AudioFormat = AudioFormat.Pcm24KMono;
41+
mediaStreamingOptions.EnableDtmfTones = true;
42+
4543
var options = new AnswerCallOptions(incomingCallContext, callbackUri) {
4644
MediaStreamingOptions = mediaStreamingOptions,
4745
};
@@ -56,14 +54,12 @@ When Azure Communication Services receives the URL for your WebSocket server, it
5654
To start media streaming during the call, you can use the API. To do so, set the `startMediaStreaming` parameter to `false` (which is the default), and later in the call, you can use the start API to enable media streaming.
5755

5856
``` C#
59-
var mediaStreamingOptions = new MediaStreamingOptions(
60-
new Uri("wss://<YOUR_WEBSOCKET_URL"),
61-
MediaStreamingContent.Audio,
62-
MediaStreamingAudioChannel.Mixed,
63-
startMediaStreaming: false) {
64-
EnableBidirectional = true,
65-
AudioFormat = AudioFormat.Pcm24KMono
66-
}
57+
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(MediaStreamingAudioChannel.Unmixed);
58+
mediaStreamingOptions.TransportUri = new Uri(websocketUri);
59+
mediaStreamingOptions.EnableBidirectional = true;
60+
mediaStreamingOptions.AudioFormat = AudioFormat.Pcm24KMono;
61+
mediaStreamingOptions.EnableDtmfTones = true;
62+
6763
var options = new AnswerCallOptions(incomingCallContext, callbackUri) {
6864
MediaStreamingOptions = mediaStreamingOptions,
6965
};
@@ -72,10 +68,11 @@ AnswerCallResult answerCallResult = await client.AnswerCallAsync(options);
7268

7369
Start media streaming via API call
7470
StartMediaStreamingOptions options = new StartMediaStreamingOptions() {
75-
OperationContext = "startMediaStreamingContext"
71+
OperationContext = "startMediaStreamingContext",
72+
OperationCallbackUri = eventCallbackUri
7673
};
7774

78-
await callMedia.StartMediaStreamingAsync();
75+
await callMedia.StartMediaStreamingAsync(options);
7976
```
8077

8178

@@ -86,10 +83,11 @@ To stop receiving audio streams during a call, you can use the **Stop streaming
8683

8784
``` C#
8885
StopMediaStreamingOptions options = new StopMediaStreamingOptions() {
89-
OperationContext = "stopMediaStreamingContext"
86+
OperationContext = "stopMediaStreamingContext",
87+
OperationCallbackUri = eventCallbackUri
9088
};
9189

92-
await callMedia.StopMediaStreamingAsync();
90+
await callMedia.StopMediaStreamingAsync(options);
9391
```
9492

9593
## Handling audio streams in your websocket server
@@ -116,6 +114,9 @@ private async Task StartReceivingFromAcsMediaWebSocket(Websocket websocket) {
116114

117115
The first packet you receive contains metadata about the stream, including audio settings such as encoding, sample rate, and other configuration details.
118116

117+
### Additional Headers
118+
The Correlation ID and Call Connection ID are now included in the WebSocket headers for improved traceability `x-ms-call-correlation-id` and `x-ms-call-connection-id`. These are sent when Azure Communication Services tries to connect to your endpoint.
119+
119120
``` json
120121
{
121122
"kind": "AudioMetadata",

articles/communication-services/how-tos/call-automation/includes/audio-streaming-quickstart-java.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -46,20 +46,25 @@ When Azure Communication Services receives the URL for your WebSocket server, it
4646
To start media streaming during the call, you can use the API. To do so, set the `startMediaStreaming` parameter to `false` (which is the default), and later in the call, you can use the start API to enable media streaming.
4747

4848
``` Java
49-
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(appConfig.getTransportUrl(), MediaStreamingTransport.WEBSOCKET, MediaStreamingContent.AUDIO, MediaStreamingAudioChannel.MIXED, false)
50-
.setEnableBidirectional(true)
51-
.setAudioFormat(AudioFormat.PCM_24K_MONO);
49+
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(MediaStreamingAudioChannel.UNMIXED);
50+
mediaStreamingOptions.setTransportUrl(appConfig.getTransportUrl());
51+
mediaStreamingOptions.setStartMediaStreaming(true);
52+
mediaStreamingOptions.setEnableDtmfTones(true); // Allow receiving DTMF tones
53+
mediaStreamingOptions.setEnableBidirectional(true);
54+
mediaStreamingOptions.setAudioFormat(AudioFormat.PCM_24K_MONO);
5255

5356
options = new AnswerCallOptions(data.getString(INCOMING_CALL_CONTEXT), callbackUri)
5457
.setCallIntelligenceOptions(callIntelligenceOptions)
5558
.setMediaStreamingOptions(mediaStreamingOptions);
5659

5760
Response answerCallResponse = client.answerCallWithResponse(options, Context.NONE);
5861

59-
StartMediaStreamingOptions startMediaStreamingOptions = new StartMediaStreamingOptions()
60-
.setOperationContext("startMediaStreamingContext");
62+
StartMediaStreamingOptions mediaStreamingOptions = new StartMediaStreamingOptions();
63+
mediaStreamingOptions.setOperationContext("StartMediaStreamingContext");
6164

62-
callConnection.getCallMedia().startMediaStreamingWithResponse(startMediaStreamingOptions, Context.NONE);    
65+
client.getCallConnection(callConnectionId)
66+
.getCallMedia()
67+
.startMediaStreamingWithResponse(mediaStreamingOptions, Context.NONE);   
6368
```
6469

6570
## Stop audio streaming
@@ -68,9 +73,12 @@ To stop receiving audio streams during a call, you can use the **Stop streaming
6873
- **Automatic stop on call disconnect:** Audio streaming automatically stops when the call is disconnected.
6974

7075
``` Java
71-
StopMediaStreamingOptions stopMediaStreamingOptions = new StopMediaStreamingOptions()
72-
.setOperationContext("stopMediaStreamingContext");
73-
callConnection.getCallMedia().stopMediaStreamingWithResponse(stopMediaStreamingOptions, Context.NONE);
76+
StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions();
77+
stopOptions.setOperationContext("StopMediaStreamingContext");
78+
79+
client.getCallConnection(callConnectionId)
80+
.getCallMedia()
81+
.stopMediaStreamingWithResponse(stopOptions, Context.NONE);
7482
```
7583

7684
## Handling audio streams in your websocket server
@@ -90,6 +98,9 @@ public void onMessage(String message, Session session) {
9098

9199
The first packet you receive contains metadata about the stream, including audio settings such as encoding, sample rate, and other configuration details.
92100

101+
### Additional Headers
102+
The Correlation ID and Call Connection ID are now included in the WebSocket headers for improved traceability `x-ms-call-correlation-id` and `x-ms-call-connection-id`. These are sent when Azure Communication Services tries to connect to your endpoint.
103+
93104
``` json
94105
{
95106
"kind": "AudioMetadata",

articles/communication-services/how-tos/call-automation/includes/audio-streaming-quickstart-js.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -34,21 +34,26 @@ Enable automatic audio streaming when the call is established by setting the fla
3434
This setting ensures that audio streaming starts automatically as soon as the call is connected.
3535

3636
``` JS
37-
var mediaStreamingOptions = new MediaStreamingOptions(
38-
new Uri("wss://YOUR_WEBSOCKET_URL"),
39-
MediaStreamingContent.Audio,
40-
MediaStreamingAudioChannel.Mixed,
41-
startMediaStreaming: true)
42-
{
43-
EnableBidirectional = true,
44-
AudioFormat = AudioFormat.Pcm24KMono
45-
}
46-
var options = new AnswerCallOptions(incomingCallContext, callbackUri)
47-
{
48-
MediaStreamingOptions = mediaStreamingOptions,
37+
const mediaStreamingOptions = {
38+
transportUrl: "wss://YOUR_WEBSOCKET_URL",
39+
transportType: "websocket",
40+
contentType: "audio",
41+
audioChannelType: "mixed",
42+
startMediaStreaming: true,
43+
enableDtmfTones: true,
44+
enableBidirectional: true,
45+
audioFormat: "Pcm24KMono"
46+
};
47+
48+
const answerCallOptions = {
49+
mediaStreamingOptions: mediaStreamingOptions
4950
};
5051

51-
AnswerCallResult answerCallResult = await client.AnswerCallAsync(options);
52+
answerCallResult = await acsClient.answerCall(
53+
incomingCallContext,
54+
callbackUri,
55+
answerCallOptions
56+
);
5257
```
5358

5459
When Azure Communication Services receives the URL for your WebSocket server, it establishes a connection to it. Once the connection is successfully made, streaming is initiated.
@@ -58,11 +63,12 @@ To start media streaming during the call, you can use the API. To do so, set the
5863

5964
``` JS
6065
const mediaStreamingOptions: MediaStreamingOptions = {
61-
transportUrl: transportUrl,
66+
transportUrl: "wss://YOUR_WEBSOCKET_URL",
6267
transportType: "websocket",
6368
contentType: "audio",
6469
audioChannelType: "unmixed",
6570
startMediaStreaming: false,
71+
enableDtmfTones: true,
6672
enableBidirectional: true,
6773
audioFormat: "Pcm24KMono"
6874
}
@@ -134,6 +140,9 @@ async function processWebsocketMessageAsync(receivedBuffer: ArrayBuffer) {
134140

135141
The first packet you receive contains metadata about the stream, including audio settings such as encoding, sample rate, and other configuration details.
136142

143+
### Additional Headers
144+
The Correlation ID and Call Connection ID are now included in the WebSocket headers for improved traceability `x-ms-call-correlation-id` and `x-ms-call-connection-id`. These are sent when Azure Communication Services tries to connect to your endpoint.
145+
137146
``` json
138147
{
139148
"kind": "AudioMetadata",

0 commit comments

Comments
 (0)