Skip to content

Commit bdbd313

Browse files
authored
Merge pull request #4998 from eric-urban/eur/bring-avatar-to-cnv-branch
bring avatar to cnv branch
2 parents 52452df + 3f003b0 commit bdbd313

23 files changed

+391
-250
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,11 @@
269269
"source_path_from_root": "/articles/ai-services/custom-vision-service/logo-detector-mobile.md",
270270
"redirect_url": "/azure/ai-services/custom-vision-service",
271271
"redirect_document_id": false
272+
},
273+
{
274+
"source_path_from_root": "/articles/ai-services/speech-service/text-to-speech-avatar/custom-avatar-endpoint.md",
275+
"redirect_url": "/azure/ai-services/speech-service/custom-avatar-create",
276+
"redirect_document_id": false
272277
}
273278
]
274279
}

articles/ai-services/speech-service/batch-synthesis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ For code samples, see [GitHub](https://github.com/Azure-Samples/cognitive-servic
4242
To submit a batch synthesis request, construct the HTTP PUT request path and body according to the following instructions:
4343

4444
- Set the required `inputKind` property.
45-
- If the `inputKind` property is set to "PlainText", then you must also set the `voice` property in the `synthesisConfig`. In the example below, the `inputKind` is set to "SSML", so the `synthesisConfig` isn't set.
45+
- If the `inputKind` property is set to "PlainText", then you must also set the `voice` property in the `synthesisConfig`. In the following example, the `inputKind` is set to "SSML", so the `synthesisConfig` isn't set.
4646
- Optionally you can set the `description`, `timeToLiveInHours`, and other properties. For more information, see [batch synthesis properties](batch-synthesis-properties.md).
4747

4848
> [!NOTE]
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
author: eric-urban
3+
ms.author: eur
4+
ms.service: azure-ai-speech
5+
ms.topic: include
6+
ms.date: 5/19/2025
7+
---
8+
9+
Getting started with a custom text to speech avatar is a straightforward process. All it takes are a few video clips of your actor. If you'd like to train a [custom voice](../../../../custom-neural-voice.md) for the same actor, you can do so separately.
10+
11+
> [!NOTE]
12+
> Custom avatar access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
13+
14+
## Prerequisites
15+
16+
You need an Azure AI Foundry resource in one of the [regions that supports custom avatar training](../../../../text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md#available-locations). Custom avatar only supports standard (S0) AI Foundry or Speech resources.
17+
18+
You need a video recording of the talent reading a consent statement acknowledging the use of their image and voice. You upload this video when you set up the avatar talent. For more information, see [Add avatar talent consent](#step-2-add-avatar-talent-consent).
19+
20+
You need video recordings of your avatar talent as training data. You upload these videos when you prepare training data. For more information, see [Add training data](#step-3-add-training-data).
21+
22+
## Step 1: Start fine-tuning
23+
24+
> [!TIP]
25+
> Don't mix data for different avatars in one fine-tuning workspace. Each avatar must have its own fine-tuning workspace.
26+
27+
To fine-tune a custom avatar, follow these steps:
28+
29+
1. Go to your Azure AI Foundry project in the [Azure AI Foundry portal](https://ai.azure.com). If you need to create a project, see [Create an Azure AI Foundry project](/azure/ai-foundry/how-to/create-projects).
30+
1. Select **Fine-tuning** from the left pane.
31+
1. Select **AI Service fine-tuning** > **+ Fine-tune**.
32+
33+
:::image type="content" source="../../../../media/custom-voice/professional-voice/fine-tune-azure-ai-services.png" alt-text="Screenshot of the page to select fine-tuning of Azure AI Services models." lightbox="../../../../media/custom-voice/professional-voice/fine-tune-azure-ai-services.png":::
34+
35+
1. In the wizard, select **Custom avatar (text to speech avatar fine-tuning)**.
36+
1. Select **Next**.
37+
1. Follow the instructions provided by the wizard to create your fine-tuning workspace.
38+
39+
## Step 2: Add avatar talent consent
40+
41+
An avatar talent is an individual or target actor whose video of speaking is recorded and used to create neural avatar models. You must obtain sufficient consent under all relevant laws and regulations from the avatar talent to use their video to create the custom text to speech avatar.
42+
43+
You must provide a video file with a recorded statement from your avatar talent, acknowledging the use of their image and voice. Microsoft verifies that the content in the recording matches the predefined script provided by Microsoft. Microsoft compares the face of the avatar talent in the recorded video statement file with randomized videos from the training datasets to ensure that the avatar talent in video recordings and the avatar talent in the statement video file are from the same person.
44+
45+
- If you want to create a voice sync for avatar during avatar training, a custom voice resembling your avatar is created alongside the custom avatar. The voice is used exclusively with the specified avatar. Your consent statement must include both the custom avatar and the voice sync for avatar.
46+
- If you don't create a voice sync for avatar, only the custom avatar is trained, and your consent statement must reflect this scope.
47+
48+
You can find the verbal consent statement in multiple languages via the [Azure-Samples/cognitive-services-speech-sdk](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/customavatar/verbal-statement-all-locales.txt) GitHub repository. The language of the verbal statement must be the same as your recording. See also the [Disclosure for avatar talent](/legal/cognitive-services/speech-service/disclosure-voice-talent?context=/azure/ai-services/speech-service/context/context).
49+
50+
For more information about recording the consent video, see [How to record video samples](../../../../text-to-speech-avatar/custom-avatar-record-video-samples.md).
51+
52+
To add an avatar talent profile and upload their consent statement in your project, follow these steps:
53+
54+
1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com).
55+
1. Select **Fine-tuning** from the left pane and then select **AI Service fine-tuning**.
56+
1. Select the custom avatar fine-tuning task (by model name) that you [started as described in the previous section](#step-1-start-fine-tuning).
57+
1. Select **Set up avatar talent** > **Upload consent video**.
58+
59+
1. On the **Upload consent video** page, follow the instructions to upload the avatar talent consent video you recorded beforehand.
60+
- Select the avatar type to build. Build a voice sync for avatar which sounds like your avatar talent together with the avatar model, or build avatar without the voice sync for avatar. The option to build a voice sync for avatar is only available in the Southeast Asia, West Europe, and West US 2 regions.
61+
- Select the speaking language of the verbal consent statement recorded by the avatar talent.
62+
- Enter the avatar talent name and your company name in the same language as the recorded statement.
63+
- The avatar talent name must be the name of the person who recorded the consent statement.
64+
- The company name must match the company name that was spoken in the recorded statement.
65+
- You can choose to upload your data from local files, or from a shared storage with Azure Blob.
66+
67+
1. Select local files from your computer or enter the Azure Blob storage URL where your data is stored.
68+
1. Select **Next**.
69+
1. Review the upload details, and select **Upload**.
70+
71+
After the avatar talent consent upload is successful, you can proceed to train your custom avatar model.
72+
73+
## Step 3: Add training data
74+
75+
The Speech service uses your training data to create a unique avatar tuned to match the look of the person in the recordings. After you train the avatar model, you can start synthesizing avatar videos or use it for live chats in your applications.
76+
77+
All data you upload must meet the requirements for the data type that you choose. To ensure that the Speech service accurately processes your data, it's important to correctly format your data before upload. To confirm that your data is correctly formatted, see [Data requirements](../../../../text-to-speech-avatar/custom-avatar-record-video-samples.md#data-requirements).
78+
79+
### Upload your data
80+
81+
When you're ready to upload your data, go to the **Prepare training data** tab to add your data.
82+
83+
To upload training data, follow these steps:
84+
1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com).
85+
1. Select **Fine-tuning** from the left pane and then select **AI Service fine-tuning**.
86+
1. Select the custom avatar fine-tuning task (by model name) that you [started as described in the previous section](#step-1-start-fine-tuning).
87+
1. Select **Prepare training data** > **Upload data**.
88+
1. In the **Upload data** wizard, choose a data type and then select **Next**. For more information about the data types (including **Naturally Speaking**, **Silent Status**, **Gesture**, and **Status 0 speaking**), see [what video clips to record](../../../../text-to-speech-avatar/custom-avatar-record-video-samples.md#what-video-clips-to-record).
89+
1. Select local files from your computer or enter the Azure Blob storage URL where your data is stored.
90+
1. Select **Next**.
91+
1. Review the upload details, and select **Upload**.
92+
93+
Data files are automatically validated when you select **Upload**. Data validation includes series of checks on the video files to verify their file format, size, and total volume. If there are any errors, fix them and submit again.
94+
95+
After you upload the data, you can check the data overview which indicates whether you provided enough data to start training.
96+
97+
## Step 4: Train your avatar model
98+
99+
> [!IMPORTANT]
100+
> All the training data in the project is included in the training. The model quality is highly dependent on the data you provided, and you're responsible for the video quality. Make sure you record the training videos according to the [how to record video samples guide](../../../../text-to-speech-avatar/custom-avatar-record-video-samples.md).
101+
102+
To create a custom avatar in the Azure AI Foundry portal, follow these steps for one of the following methods:
103+
1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com).
104+
1. Select **Fine-tuning** from the left pane and then select **AI Service fine-tuning**.
105+
1. Select the custom avatar fine-tuning task (by model name) that you [started as described in the previous section](#step-1-start-fine-tuning).
106+
1. Select **Train model** > **+ Train model**.
107+
1. Enter a Name to help you identify the model. Choose a name carefully. The model name is used as the avatar name in your synthesis request by the SDK and SSML input. Only letters, numbers, hyphens, and underscores are allowed. Use a unique name for each model.
108+
109+
> [!IMPORTANT]
110+
> The avatar model name must be unique within the same Speech or AI Services resource.
111+
112+
1. Select **Train** to start training the model.
113+
114+
Training duration varies depending on how much data you use. It normally takes 20-40 compute hours on average to train a custom avatar. Check the [pricing note](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services) on how training is charged.
115+
116+
### Copy your custom avatar model to another project (optional)
117+
118+
Custom avatar training is currently only available in some regions. After your avatar model is trained in a supported region, you can copy it to an AI Services resource for Speech in another region as needed. For more information, see footnotes in the [regions table](../../../../regions.md).
119+
120+
> [!NOTE]
121+
> You can only copy the voice sync for avatar model to the regions that support the voice sync for avatar feature, which are the same regions that support personal voice.
122+
123+
To copy your custom avatar model to another project:
124+
1. On the **Train model** tab, select an avatar model that you want to copy, and then select **Copy to project**.
125+
1. Select the subscription, region, AI Services resource for Speech, and project where you want to copy the model to. You must have an AI Services resource for Speech and project in the target region, otherwise you need to create them first.
126+
1. Select **Submit** to copy the model.
127+
128+
Once the model is copied, you see a notification in the Azure AI Foundry portal.
129+
130+
Navigate to the project where you copied the model to deploy the model copy.
131+
132+
## Step 5: Deploy and use your avatar model
133+
134+
After you successfully created and trained your avatar model, you deploy it to your endpoint.
135+
136+
To deploy your avatar:
137+
1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com).
138+
1. Select **Fine-tuning** from the left pane and then select **AI Service fine-tuning**.
139+
1. Select the custom avatar fine-tuning task (by model name) that you [started as described in the previous section](#step-1-start-fine-tuning).
140+
1. Select **Deploy model** > **Deploy model**.
141+
1. Select a model that you want to deploy.
142+
1. Select **Deploy** to start the deployment.
143+
144+
> [!IMPORTANT]
145+
> When a model is deployed, you pay for continuous up time of the endpoint regardless of your interaction with that endpoint. Check the pricing note on how model deployment is charged. You can delete a deployment when the model isn't in use to reduce spending and conserve resources.
146+
147+
After you deploy your custom avatar, it's available to use in the Azure AI Foundry portal or via API:
148+
- The avatar appears in the avatar list of [text to speech avatar on Azure AI Foundry portal](https://speech.microsoft.com/portal/talkingavatar).
149+
- The avatar appears in the avatar list of [live chat avatars via Azure AI Foundry portal](https://speech.microsoft.com/portal/livechat).
150+
- You can call the avatar from the SDK and SSML input by specifying the avatar model name. For more information, see the [avatar properties](../../../../text-to-speech-avatar/batch-synthesis-avatar-properties.md#avatar-properties).
151+
152+
### Remove a deployment
153+
154+
To remove your deployment, follow these steps:
155+
1. Sign in to the [Azure AI Foundry portal](https://ai.azure.com).
156+
1. Select **Fine-tuning** from the left pane and then select **AI Service fine-tuning**.
157+
1. Select the custom avatar fine-tuning task (by model name) that you [started as described in the previous section](#step-1-start-fine-tuning).
158+
1. Select the deployment on the **Deploy model** page. The model is actively hosted if the status is "Succeeded".
159+
1. You can select the **Delete deployment** button and confirm the deletion to remove the hosting.
160+
161+
> [!TIP]
162+
> Once a deployment is removed, you no longer pay for its hosting. Deleting a deployment doesn't cause any deletion of your model. If you want to use the model again, create a new deployment.
163+

0 commit comments

Comments
 (0)