Merge pull request #274682 from sally-baolian/patch-240

prmerger-automator[bot] · web-flow · commit 4406d1cc7faa · 2024-05-09T14:20:35.000Z
Update avatar docs
diff --git a/articles/ai-services/speech-service/speech-services-quotas-and-limits.md b/articles/ai-services/speech-service/speech-services-quotas-and-limits.md
@@ -115,11 +115,17 @@ The limits in this table apply per Speech resource when you create a personal vo
 | REST API limit (not including speech synthesis) | Not available for F0 | 50 requests per 10 seconds |
 | Max number of transactions per second (TPS) for speech synthesis|Not available for F0  |200 transactions per second (TPS) (default value)  |
 
+#### Batch text to speech avatar 
+
+| Quota | Free (F0)| Standard (S0) |
+|-----|-----|-----|
+| REST API limit  | Not available for F0 | 2 requests per 1 minute  |
+
 #### Real-time text to speech avatar
 
 | Quota | Free (F0)| Standard (S0) |
 |-----|-----|-----|
-| New connections per minute | Not available for F0 | Two new connections per minute |
+| New connections per minute | Not available for F0 | 2 new connections per minute |
 
 #### Audio Content Creation tool
 
@@ -297,3 +303,14 @@ Initiate the increase of the limit for concurrent requests for your resource, or
    - Any other required information.
 1. On the **Review + create** tab, select **Create**.
 1. Note the support request number in Azure portal notifications. You're contacted shortly about your request.
+
+### Text to speech avatar: increase new connections limit
+
+To increase the limit of new connections per minute for text to speech avatar, contact your sales representative to create a [ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/overview) with the following information:
+
+- Speech resource URI
+- Requested new limitation to increase to
+- Justification for the increase
+- Starting date for the increase
+- Ending date for the increase
+- Prebuilt avatar or custom avatar
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md b/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-record-video-samples.md
@@ -130,9 +130,9 @@ High-quality avatar models are built from high-quality video recordings, includi
 
 ## Data requirements
 
-Doing some basic processing of your video data is helpful for model training efficiency, such as:
+Doing some basic processing of your video data is helpful for model training efficiency, such as: 
 
-- Make sure that the character is in the middle of the screen, the size and position are consistent during the video processing. Each video processing parameter such as brightness, contrast remains the same and doesn't change.
+- Make sure that the character is in the middle of the screen, the size and position are consistent during the video processing. Each video processing parameter such as brightness, contrast remains the same and doesn't change. The output avatar's size, position, brightness, contrast will directly reflect those present in the training data. We don't apply any alterations during processing or model building.
 - The start and end of the clip should be kept in state 0; the actors should close their mouths and smile, and look ahead. The video should be continuous, not abrupt.
 
 **Avatar training video recording file format:** .mp4 or .mov.
diff --git a/articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md b/articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md
@@ -82,9 +82,18 @@ const avatarConfig = new SpeechSDK.AvatarConfig(
 
 Real-time avatar uses WebRTC protocol to output the avatar video stream. You need to set up the connection with the avatar service through WebRTC peer connection.
 
-First, you need to create a WebRTC peer connection object. WebRTC is a P2P protocol, which relies on ICE server for network relay. Azure provides [Communication Services](../../../communication-services/overview.md), which can provide network relay function. Therefore, we recommend you fetch the ICE server from the Azure Communication resource, which is mentioned in the [prerequisites section](#prerequisites). You can also choose to use your own ICE server.
+First, you need to create a WebRTC peer connection object. WebRTC is a P2P protocol, which relies on ICE server for network relay. Speech service provides network relay function and exposes a REST API to issue the ICE server information. Therefore, we recommend you fetch the ICE server from the speech service. You can also choose to use your own ICE server.
 
-The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the network relay token you prepared in the [prerequisites section](#prerequisites) or from the configuration of your own ICE server.
+Here is a sample request to fetch ICE information from the speech service endpoint:
+
+```HTTP
+GET /cognitiveservices/avatar/relay/token/v1 HTTP/1.1
+
+Host: westus2.tts.speech.microsoft.com
+Ocp-Apim-Subscription-Key: YOUR_RESOURCE_KEY
+```
+
+The following code snippet shows how to create the WebRTC peer connection. The ICE server URL, ICE server username, and ICE server credential can all be fetched from the payload of above HTTP request.
 
 ```JavaScript
 // Create WebRTC peer connection
@@ -141,6 +150,8 @@ avatarSynthesizer.startAvatarAsync(peerConnection).then(
 );
 ```
 
+Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar is not idle and functioning normally, the real-time API will disconnect after a 10-minute connection. To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable auto-reconnect. For how to set up auto-reconnect, refer to this [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
+
 ## Synthesize talking avatar video from text input
 
 After the above steps, you should see the avatar video being played in the web browser. The avatar is active, with eye blink and slight body movement, but not speaking yet. The avatar is waiting for text input to start speaking.