Skip to content

Commit a9c37d4

Browse files
committed
AI Speech avatar samples
1 parent e9794a4 commit a9c37d4

File tree

2 files changed

+26
-14
lines changed

2 files changed

+26
-14
lines changed

articles/ai-services/speech-service/includes/release-notes/release-notes-sdk.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,14 @@ ms.author: eur
77
---
88
### 2024-November release
99

10+
#### Azure AI Speech Toolkit extension for Visual Studio Code
11+
1012
Azure AI Speech Toolkit extension is now available for Visual Studio Code users. It contains a list of speech quick-starts and scenario samples that can be easily built and run with simple clicks. For more information, see [Azure AI Speech Toolkit in Visual Studio Code Marketplace](https://aka.ms/speech-toolkit-vscode).
1113

14+
#### Text to speech avatar code samples
15+
16+
We added text to speech avatar code samples for [Android](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/java/android/avatar) and [iOS](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/swift/ios/avatar). These samples demonstrate how to use real-time text to speech avatars in your mobile applications.
17+
1218
### Speech SDK 1.41.1: 2024-October release
1319

1420
#### New Features

articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use text to speech avatar with real-time synthesis.
55
manager: nitinme
66
ms.service: azure-ai-speech
77
ms.topic: overview
8-
ms.date: 9/12/2024
8+
ms.date: 11/19/2024
99
ms.reviewer: v-baolianzou
1010
ms.author: eur
1111
author: eric-urban
@@ -36,15 +36,15 @@ Here's the compatibility of real-time avatar on different platforms and browsers
3636
| iOS | Y | Y | Y | Y | Y |
3737
| macOS | Y | Y | Y | Y<sup>1</sup> | Y |
3838

39-
<sup>1</sup> It dosen't work with ICE server by Communication Service but works with Coturn.
39+
<sup>1</sup> It dosesn't work with ICE server by Communication Service but works with Coturn.
4040

4141
<sup>2</sup> Background transparency doesn't work.
4242

4343
## Select text to speech language and voice
4444

4545
The text to speech feature in the Speech service supports a broad portfolio of [languages and voices](../language-support.md?tabs=tts). You can get the full list or try them in the [Voice Gallery](https://speech.microsoft.com/portal/voicegallery).
4646

47-
Specify the language or voice of `SpeechConfig` to match your input text and use the specified voice. The following code snippet shows how this technique works:
47+
To match your input text and use the specified voice, you can set the `SpeechSynthesisLanguage` or `SpeechSynthesisVoiceName` properties in the `SpeechConfig` object. The following code snippet shows how this technique works:
4848

4949
```JavaScript
5050
const speechConfig = SpeechSDK.SpeechConfig.fromSubscription("YourSpeechKey", "YourSpeechRegion");
@@ -91,7 +91,7 @@ Host: westus2.tts.speech.microsoft.com
9191
Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY
9292
```
9393

94-
The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the payload of above HTTP request.
94+
The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the payload of the previous HTTP request.
9595

9696
```JavaScript
9797
// Create WebRTC peer connection
@@ -148,11 +148,11 @@ avatarSynthesizer.startAvatarAsync(peerConnection).then(
148148
);
149149
```
150150

151-
Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar isn't idle and functioning normally, the real-time API will disconnect after a 10-minute connection. To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable auto-reconnect. For information about how to set up auto-reconnect, refer to this [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
151+
Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar isn't idle and functioning normally, the real-time API will disconnect after a 10-minute connection. To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable automatic reconnect. For information about how to set up automatic reconnect, refer to this [JavaScript sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
152152

153153
## Synthesize talking avatar video from text input
154154

155-
After the above steps, you should see the avatar video being played in the web browser. The avatar is active, with eye blink and slight body movement, but not speaking yet. The avatar is waiting for text input to start speaking.
155+
After the previous steps, you should see the avatar video being played in the web browser. The avatar is active, with eye blink and slight body movement, but not speaking yet. The avatar is waiting for text input to start speaking.
156156

157157
The following code snippet shows how to send text to the avatar synthesizer and let the avatar speak:
158158

@@ -178,22 +178,18 @@ avatarSynthesizer.speakTextAsync(spokenText).then(
178178
});
179179
```
180180

181-
You can find end-to-end working samples on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/js/browser/avatar).
182-
183181
## Close the real-time avatar connection
184182

185-
To avoid unnecessary costs after you finish using the real-time avatar, it’s important to close the connection. There are several ways to do this:
183+
To avoid unnecessary costs after you finish using the real-time avatar, it’s important to close the connection. There are several ways to close the connection:
186184

187-
- When the browser web page is closed, the WebRTC client side peer connection object will be released, and the avatar connection will be automatically closed after a few seconds.
188-
- If the avatar remains idle for 5 minutes, the connection will be automatically closed by the avatar service.
185+
- When the browser web page is closed, the WebRTC client side peer connection object is released. Then the avatar connection is automatically closed after a few seconds.
186+
- The connection is automatically closed if the avatar remains idle for 5 minutes.
189187
- You can proactively close the avatar connection by running the following code:
190188

191189
```javascript
192190
avatarSynthesizer.close()
193191
```
194192

195-
You can find end-to-end working samples on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/js/browser/avatar).
196-
197193
## Edit background
198194

199195
The avatar real-time synthesis API currently doesn't support setting a background image/video and only supports setting a solid-color background, without transparent background support. However, there's an alternative way to implement background customization on the client side, following these guidelines:
@@ -203,10 +199,20 @@ The avatar real-time synthesis API currently doesn't support setting a backgroun
203199
- Capture each frame of the avatar video and apply a pixel-by-pixel calculation to set the green pixel to transparent, and draw the recalculated frame to the canvas.
204200
- Hide the original video.
205201

206-
With this approach, you can get an animated canvas that plays like a video, which has a transparent background. Here's the [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/js/basic.js#L108) to demonstrate such an approach.
202+
With this approach, you can get an animated canvas that plays like a video, which has a transparent background. Here's the [JavaScript sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/js/basic.js#L108) to demonstrate such an approach.
207203

208204
After you have a transparent-background avatar, you can set the background to any image or video by placing the image or video behind the canvas.
209205

206+
## Code samples
207+
208+
You can find text to speech avatar code samples in the Speech SDK repository on GitHub. The samples demonstrate how to use real-time text to speech avatars in your web applications.
209+
210+
- [JavaScript](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/js)
211+
- [Android](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/java/android/avatar)
212+
- [iOS](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/swift/ios/avatar)
213+
214+
These samples demonstrate how to use real-time text to speech avatars in your mobile applications.
215+
210216
## Next steps
211217

212218
* [What is text to speech avatar](what-is-text-to-speech-avatar.md)

0 commit comments

Comments
 (0)