Skip to content

Commit 5820c61

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-ai-docs-pr into stp_final_prompt
2 parents 09a76d5 + d7c37bc commit 5820c61

File tree

6 files changed

+54
-47
lines changed

6 files changed

+54
-47
lines changed

articles/ai-foundry/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,18 +196,18 @@ Discounts on top of the hourly usage price can be obtained by purchasing an Azur
196196

197197
* If the size of provisioned deployments within the scope of a reservation exceeds the amount of the reservation, the excess is charged at the hourly rate. For example, if deployments amounting to 250 PTUs exist within the scope of a 200 PTU reservation, 50 PTUs will be charged on an hourly basis until the deployment sizes are reduced to 200 PTUs, or a new reservation is created to cover the remaining 50.
198198

199-
* Reservations guarantee a discounted price for the selected term.  They don't reserve capacity on the service or guarantee that it will be available when a deployment is created. It's highly recommended that customers create deployments prior to purchasing a reservation to prevent from over-purchasing a reservation.
199+
* Reservations guarantee a discounted price for the selected term.  They don't reserve capacity on the service or guarantee that it will be available when a deployment is created. It's highly recommended that customers create deployments prior to purchasing a reservation to protect against over-purchasing a reservation.
200200

201201
> [!IMPORTANT]
202-
> * Capacity availability for model deployments is dynamic and changes frequently across regions and models. To prevent you from purchasing a reservation for more PTUs than you can use, create deployments first, and then purchase the Azure Reservation to cover the PTUs you have deployed. This best practice will ensure that you can take full advantage of the reservation discount and prevent you from purchasing a term commitment that you cannot use.
202+
> * Capacity availability for model deployments is dynamic and changes frequently across regions and models. To protect against purchasing a reservation for more PTUs than you can use, create deployments first, and then purchase the Azure Reservation to cover the PTUs you have deployed. This best practice will ensure that you can take full advantage of the reservation discount, and protects you from committing to a reservation that you cannot use.
203203
>
204204
> * The Azure role and tenant policy requirements to purchase a reservation are different than those required to create a deployment or Azure AI Foundry resource. Verify authorization to purchase reservations in advance of needing to do so. See [Azure AI Foundry Provisioned Throughput Reservation](https://aka.ms/oai/docs/ptum-reservations) for more details.
205205
206206
## Important: sizing Azure AI Foundry Provisioned Throughput Reservation
207207

208208
The PTU amounts in reservation purchases are independent of PTUs allocated in quota or used in deployments. It's possible to purchase a reservation for more PTUs than you have in quota, or can deploy for the desired region, model, or version. Credits for over-purchasing a reservation are limited, and customers must take steps to ensure they maintain their reservation sizes in line with their deployed PTUs.
209209

210-
The best practice is to always purchase a reservation after deployments have been created. This prevents purchasing a reservation and then finding out that the required capacity isn't available for the desired region or model.
210+
The best practice is to always purchase a reservation after deployments have been created. This protects against purchasing a reservation and then finding out that the required capacity isn't available for the desired region or model.
211211

212212

213213
Reservations for Global, Data Zone, and Regional deployments aren't interchangeable. You need to purchase a separate reservation for each deployment type.

articles/ai-foundry/openai/how-to/realtime-audio-websockets.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,6 @@ The GPT real-time models are available for global deployments in [East US 2 and
3333
- `gpt-realtime` (`2025-08-28`)
3434
- `gpt-realtime-mini` (`2025-10-06`)
3535

36-
You should use API version `2025-04-01-preview` in the URL for the Realtime API.
37-
3836
For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
3937

4038
## Prerequisites
@@ -56,25 +54,25 @@ You can construct a full request URI by concatenating:
5654
- The secure WebSocket (`wss://`) protocol.
5755
- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
5856
- The `openai/realtime` API path.
59-
- An `api-version` query string parameter for a supported API version such as `2024-12-17`
6057
- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, or `gpt-realtime` model deployment.
58+
- - **(Preview version only)** An `api-version` query string parameter for a supported API version such as `2025-04-01-preview`
6159

6260
The following example is a well-constructed `/realtime` request URI:
6361

64-
#### [preview version](#tab/preview)
62+
#### [GA version](#tab/ga)
6563

6664
```http
67-
wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2025-04-01-preview&deployment=gpt-4o-mini-realtime-preview-deployment-name
65+
wss://my-eastus2-openai-resource.openai.azure.com/openai/v1/realtime?model=gpt-realtime-deployment-name
6866
```
69-
#### [GA version](#tab/ga)
67+
68+
#### [Preview version](#tab/preview)
7069

7170
```http
72-
wss://my-eastus2-openai-resource.openai.azure.com/openai/v1/realtime?model=gpt-realtime-deployment-name
71+
wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2025-04-01-preview&deployment=gpt-4o-mini-realtime-preview-deployment-name
7372
```
7473

7574
---
7675

77-
7876
To authenticate:
7977
- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
8078
- **API key**: An `api-key` can be provided in one of two ways:

articles/ai-services/speech-service/faq-stt.yml

Lines changed: 37 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ metadata:
1212
title: Speech to text FAQ
1313
summary: |
1414
This article answers commonly asked questions about the speech to text capability. If you can't find answers to your questions here, check out [other support options](../cognitive-services-support-options.md?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext%253fcontext%253d%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
15-
15+
1616
1717
sections:
1818
- name: General
@@ -26,14 +26,14 @@ sections:
2626
Where do I start if I want to use a base model?
2727
answer: |
2828
First, get an API key and region in the [Azure portal](https://portal.azure.com). If you want to make REST calls to a predeployed base model, see the [REST APIs](rest-speech-to-text.md) documentation. If you want to use WebSockets, [download the Speech SDK](speech-sdk.md).
29-
29+
3030
- question: |
3131
Do I always need to build a custom speech model?
3232
answer: |
3333
No. If your application uses generic, day-to-day language, you don't need to customize a model. If your application is used in an environment where there's little or no background noise, you don't need to customize a model.
34-
34+
3535
You can deploy baseline and customized models in the portal and then run accuracy tests against them. You can use this feature to measure the accuracy of a base model versus a custom model.
36-
36+
3737
- question: |
3838
How do I know when the processing for my dataset or model is complete?
3939
answer: |
@@ -53,62 +53,62 @@ sections:
5353
I get several results for each phrase with the detailed output format. Which one should I use?
5454
answer: |
5555
Always take the first result, even if another result ("N-Best") might have a higher confidence value. Speech service considers the first result to be the best. The result can also be an empty string if no speech was recognized.
56-
56+
5757
The other results are likely worse and might not have full capitalization and punctuation applied. These results are most useful in special scenarios, such as giving users the option to pick corrections from a list or handling incorrectly recognized commands.
58-
58+
5959
- question: |
6060
Why are there multiple base models?
6161
answer: |
6262
You can choose from more than one base model in Speech service. Each model name contains the date when it was added. When you start training a custom model, use the most recent model to get the best accuracy. Older base models are still available for some time after a new model is made available. You can continue using the model that you worked with until it's retired (see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md)). We still recommend that you switch to the latest base model for better accuracy.
63-
63+
6464
- question: |
6565
Can I update my existing model (model stacking)?
6666
answer: |
6767
You can't update an existing model. As a solution, combine the old dataset with the new dataset and readapt.
68-
68+
6969
The old dataset and the new dataset must be combined in a single .zip file (for acoustic data) or in a .txt file (for language data). When the adaptation is finished, redeploy the new, updated model to obtain a new endpoint.
70-
70+
7171
- question: |
7272
When a new version of a base model is available, is my deployment automatically updated?
7373
answer: |
7474
Deployments are *not* automatically updated.
75-
75+
7676
If you adapted and deployed a model, the existing deployment remains as is. You can decommission the deployed model, readapt it by using the newer version of the base model, and redeploy it for better accuracy.
77-
77+
7878
Both base models and custom models are retired after some time (see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md)).
79-
79+
8080
- question: |
8181
Can I download my model and run it locally?
8282
answer: |
8383
You can run a custom model locally in a [Docker container](speech-container-howto.md?tabs=cstt).
84-
84+
8585
- question: |
8686
Can I copy or move my datasets, models, and deployments to another region or subscription?
8787
answer: |
8888
You can use the [Models_Copy REST API](/rest/api/speechtotext/models/copy) to copy a custom model to another region or subscription. Datasets and deployments can't be copied. You can import a dataset again in another subscription and create endpoints there by using the model copies.
89-
89+
9090
- question: |
9191
Are my requests logged?
9292
answer: |
9393
By default, requests aren't logged (neither audio nor transcription). If necessary, you can select the **Log content from this endpoint** option when you [create a custom endpoint](how-to-custom-speech-deploy-model.md#add-a-deployment-endpoint). You can also enable audio logging in the [Speech SDK](how-to-use-logging.md) on a per-request basis, without having to create a custom endpoint. In both cases, audio and recognition results of requests will be stored in secure storage. Subscriptions that use Microsoft-owned storage are available for 30 days.
94-
94+
9595
You can export the logged files on the deployment page in Speech Studio if you use a custom endpoint with **Log content from this endpoint** enabled. If audio logging is enabled via the SDK, call the API to access the files. You can also use API to [delete the logs](/rest/api/speechtotext/endpoints/delete-base-model-log) anytime.
96-
96+
9797
- question: |
9898
Are my requests throttled?
9999
answer: |
100100
For information, see [Speech service quotas and limits](speech-services-quotas-and-limits.md).
101-
101+
102102
- question: |
103103
How am I charged for dual channel audio?
104104
answer: |
105105
If you submit each channel separately in their own file, you're charged for the audio duration of each file. If you submit a single file with the channels multiplexed together, you're charged for the duration of the single file. For more information about pricing, see the [Azure AI services pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
106-
106+
107107
> [!IMPORTANT]
108108
> If you have further privacy concerns that prevent you from using the custom speech service, contact one of the support channels.
109109
110110
## Increasing concurrency
111-
111+
112112
For information, see [Speech service quotas and limits](speech-services-quotas-and-limits.md).
113113
114114
- name: Importing data
@@ -117,7 +117,7 @@ sections:
117117
What is the limit to the size of a dataset, and why is it the limit?
118118
answer: |
119119
The limit is because of the restriction on the size of files for HTTP upload. For the actual limit, see [Speech service quotas and limits](speech-services-quotas-and-limits.md). You can split your data into multiple datasets and select all of them to train the model.
120-
120+
121121
- question: |
122122
Can I zip (compress) my text files so that I can upload a larger text file?
123123
answer: |
@@ -154,18 +154,18 @@ sections:
154154
How long does it take to train a custom model with audio data?
155155
answer: |
156156
Training a model with audio data can be a lengthy process. Depending on the amount of data, it can take several days to create a custom model. If it can't be finished within one week, the service might abort the training operation and report the model as failed.
157-
157+
158158
In general, Speech service processes approximately 10 hours of audio data per day in regions that have dedicated hardware. Training with text only is faster and ordinarily finishes within minutes.
159-
160-
Use one of the regions where dedicated hardware is available for training. The Speech service uses up to 100 hours of audio for training in these regions.
161-
159+
160+
Use one of the regions where dedicated hardware is available for training. The Speech service uses up to 100 hours of audio for training in these regions.
161+
162162
- name: Accuracy testing
163163
questions:
164164
- question: |
165165
What is word error rate (WER), and how is it computed?
166166
answer: |
167167
WER is the evaluation metric for speech recognition. WER is calculated as the total number of errors (insertions, deletions, and substitutions), divided by the total number of words in the reference transcription. For more information, see [Test model quantitatively](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer).
168-
168+
169169
- question: |
170170
How do I determine whether the results of an accuracy test are good?
171171
answer: |
@@ -188,9 +188,20 @@ sections:
188188
answer: |
189189
Uploading a list of words adds them to the vocabulary, but it doesn't teach the system how the words are ordinarily used. By providing full or partial utterances (sentences or phrases of things that users are likely to say), the language model can learn the new words and how they're used. The custom language model is good not only for adding new words to the system, but also for adjusting the likelihood of known words for your application. Providing full utterances helps the system learn better.
190190
191+
- name: Pronunciation assessment
192+
questions:
193+
- question: |
194+
Why does the recognized text differ from the reference text?
195+
answer: |
196+
The recognized text is generated based on the audio input, the reference text, the `EnableMiscue` configuration and the assessment mode.
197+
198+
In **scripted assessment**, there are two modes, single-shot and continuous, and the behavior differs slightly. In single-shot mode, if `EnableMiscue` is set to `false`, the system forces the recognized text to match the reference text. When `EnableMiscue` is `true`, only the words present in the reference text are considered as recognized results from the audio input. Continuous mode does not support the `EnableMiscue` option and behaves similarly to single-shot mode with `EnableMiscue` set to `true`. Differences between recognized and reference text might occur due to factors such as pronunciation variations, background noise, or limitations in the speech recognition model.
199+
200+
In **unscripted assessment**, the recognized text is generated solely from the audio input without any reference text, which can lead to discrepancies between the recognized text and the intended content. In these cases, the recognized text reflects what the system interprets from the audio and may not always align with the expected message. If you notice significant differences, review the audio quality and speaker clarity, or consider using Azure Speech-to-Text to transcribe the audio first. You can then use that transcription as the reference text for a more accurate assessment.
201+
191202
additionalContent: |
192203
193204
## Next steps
194-
205+
195206
- [Speech to text quickstart](get-started-speech-to-text.md)
196207
- [What's new](releasenotes.md)

articles/ai-services/speech-service/how-to-pronunciation-assessment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ For how to use Pronunciation Assessment in streaming mode in your own applicatio
8484

8585
::: zone pivot="programming-language-csharp"
8686

87-
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/scenarios/csharp/sharedcontent/console/speech_recognition_samples.cs) under the function `PronunciationAssessmentContinuousWithFile`.
87+
If your audio file exceeds 30 seconds, use continuous mode for processing. In continuous mode, the `EnableMiscue` option is not supported. To obtain `Omission` and `Insertion` tags, you need to compare the recognized results with the reference text. You can find a sample implementation for continuous mode on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/scenarios/csharp/sharedcontent/console/speech_recognition_samples.cs) under the function `PronunciationAssessmentContinuousWithFile`.
8888

8989
::: zone-end
9090

@@ -519,7 +519,7 @@ This table lists some of the key pronunciation assessment results for the script
519519
This table lists some of the key pronunciation assessment results for the unscripted assessment, or speaking scenario.
520520

521521
> [!NOTE]
522-
> Prosody assessment is only available in the [en-US](./language-support.md?tabs=pronunciation-assessment) locale.
522+
> Prosody assessment is only available in the [en-US](./language-support.md?tabs=pronunciation-assessment) locale. For unscripted assessment, the speech-to-text (STT) model used is different from Azure STT. If you need assessment based on highly accurate recognized text, we recommend first calling Azure STT to obtain the reference text, and then performing scripted assessment.
523523
524524
| Response parameter | Description | Granularity |
525525
|:-------------------|:------------|:------------|

0 commit comments

Comments
 (0)