Skip to content

Commit 4406d1c

Browse files
Merge pull request #274682 from sally-baolian/patch-240
Update avatar docs
2 parents ca2f369 + a3a5493 commit 4406d1c

File tree

3 files changed

+33
-5
lines changed

3 files changed

+33
-5
lines changed

articles/ai-services/speech-service/speech-services-quotas-and-limits.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,17 @@ The limits in this table apply per Speech resource when you create a personal vo
115115
| REST API limit (not including speech synthesis) | Not available for F0 | 50 requests per 10 seconds |
116116
| Max number of transactions per second (TPS) for speech synthesis|Not available for F0 |200 transactions per second (TPS) (default value) |
117117

118+
#### Batch text to speech avatar
119+
120+
| Quota | Free (F0)| Standard (S0) |
121+
|-----|-----|-----|
122+
| REST API limit | Not available for F0 | 2 requests per 1 minute |
123+
118124
#### Real-time text to speech avatar
119125

120126
| Quota | Free (F0)| Standard (S0) |
121127
|-----|-----|-----|
122-
| New connections per minute | Not available for F0 | Two new connections per minute |
128+
| New connections per minute | Not available for F0 | 2 new connections per minute |
123129

124130
#### Audio Content Creation tool
125131

@@ -297,3 +303,14 @@ Initiate the increase of the limit for concurrent requests for your resource, or
297303
- Any other required information.
298304
1. On the **Review + create** tab, select **Create**.
299305
1. Note the support request number in Azure portal notifications. You're contacted shortly about your request.
306+
307+
### Text to speech avatar: increase new connections limit
308+
309+
To increase the limit of new connections per minute for text to speech avatar, contact your sales representative to create a [ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/overview) with the following information:
310+
311+
- Speech resource URI
312+
- Requested new limitation to increase to
313+
- Justification for the increase
314+
- Starting date for the increase
315+
- Ending date for the increase
316+
- Prebuilt avatar or custom avatar

articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,9 @@ High-quality avatar models are built from high-quality video recordings, includi
130130

131131
## Data requirements
132132

133-
Doing some basic processing of your video data is helpful for model training efficiency, such as:
133+
Doing some basic processing of your video data is helpful for model training efficiency, such as:
134134

135-
- Make sure that the character is in the middle of the screen, the size and position are consistent during the video processing. Each video processing parameter such as brightness, contrast remains the same and doesn't change.
135+
- Make sure that the character is in the middle of the screen, the size and position are consistent during the video processing. Each video processing parameter such as brightness, contrast remains the same and doesn't change. The output avatar's size, position, brightness, contrast will directly reflect those present in the training data. We don't apply any alterations during processing or model building.
136136
- The start and end of the clip should be kept in state 0; the actors should close their mouths and smile, and look ahead. The video should be continuous, not abrupt.
137137

138138
**Avatar training video recording file format:** .mp4 or .mov.

articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,18 @@ const avatarConfig = new SpeechSDK.AvatarConfig(
8282

8383
Real-time avatar uses WebRTC protocol to output the avatar video stream. You need to set up the connection with the avatar service through WebRTC peer connection.
8484

85-
First, you need to create a WebRTC peer connection object. WebRTC is a P2P protocol, which relies on ICE server for network relay. Azure provides [Communication Services](../../../communication-services/overview.md), which can provide network relay function. Therefore, we recommend you fetch the ICE server from the Azure Communication resource, which is mentioned in the [prerequisites section](#prerequisites). You can also choose to use your own ICE server.
85+
First, you need to create a WebRTC peer connection object. WebRTC is a P2P protocol, which relies on ICE server for network relay. Speech service provides network relay function and exposes a REST API to issue the ICE server information. Therefore, we recommend you fetch the ICE server from the speech service. You can also choose to use your own ICE server.
8686

87-
The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the network relay token you prepared in the [prerequisites section](#prerequisites) or from the configuration of your own ICE server.
87+
Here is a sample request to fetch ICE information from the speech service endpoint:
88+
89+
```HTTP
90+
GET /cognitiveservices/avatar/relay/token/v1 HTTP/1.1
91+
92+
Host: westus2.tts.speech.microsoft.com
93+
Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY
94+
```
95+
96+
The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the payload of above HTTP request.
8897

8998
```JavaScript
9099
// Create WebRTC peer connection
@@ -141,6 +150,8 @@ avatarSynthesizer.startAvatarAsync(peerConnection).then(
141150
);
142151
```
143152

153+
Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar is not idle and functioning normally, the real-time API will disconnect after a 10-minute connection. To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable auto-reconnect. For how to set up auto-reconnect, refer to this [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
154+
144155
## Synthesize talking avatar video from text input
145156

146157
After the above steps, you should see the avatar video being played in the web browser. The avatar is active, with eye blink and slight body movement, but not speaking yet. The avatar is waiting for text input to start speaking.

0 commit comments

Comments
 (0)