Skip to content

Commit 65dad57

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into apicimp
2 parents 9156f16 + 1b5191d commit 65dad57

File tree

10 files changed

+56
-26
lines changed

10 files changed

+56
-26
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure OpenAI Service provisioned throughput
33
description: Learn about provisioned throughput and Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 04/29/2024
6+
ms.date: 05/02/2024
77
manager: nitinme
88
author: mrbullwinkle #ChrisHMSFT
99
ms.author: mbullwin #chrhoder
@@ -80,22 +80,30 @@ PTUs represent an amount of model processing capacity. Similar to your computer
8080

8181
A few high-level considerations:
8282
- Generations require more capacity than prompts
83-
- Larger calls are progressively more expensive to compute. For example, 100 calls of with a 1000 token prompt size will require less capacity than 1 call with 100,000 tokens in the prompt. This also means that the distribution of these call shapes is important in overall throughput. Traffic patterns with a wide distribution that includes some very large calls may experience lower throughput per PTU than a narrower distribution with the same average prompt & completion token sizes.
83+
- Larger calls are progressively more expensive to compute. For example, 100 calls of with a 1000 token prompt size will require less capacity than 1 call with 100,000 tokens in the prompt. This also means that the distribution of these call shapes is important in overall throughput. Traffic patterns with a wide distribution that includes some very large calls may experience lower throughput per PTU than a narrower distribution with the same average prompt & completion token sizes.
8484

85+
### How utilization performance works
8586

86-
### How utilization enforcement works
87-
Provisioned deployments provide you with an allocated amount of model processing capacity to run a given model. The `Provisioned-Managed Utilization` metric in Azure Monitor measures a given deployments utilization on 1-minute increments. Provisioned-Managed deployments are optimized to ensure that accepted calls are processed with a consistent model processing time (actual end-to-end latency is dependent on a call's characteristics). When the workload exceeds the allocated PTU capacity, the service returns a 429 HTTP status code until the utilization drops down below 100%.
87+
Provisioned deployments provide you with an allocated amount of model processing capacity to run a given model.
8888

89+
In Provisioned-Managed deployments, when capacity is exceeded, the API will immediately return a 429 HTTP Status Error. This enables the user to make decisions on how to manage their traffic. Users can redirect requests to a separate deployment, to a standard pay-as-you-go instance, or leverage a retry strategy to manage a given request. The service will continue to return the 429 HTTP status code until the utilization drops below 100%.
90+
91+
### How can I monitor capacity?
92+
93+
The [Provisioned-Managed Utilization V2 metric](../how-to/monitoring.md#azure-openai-metrics) in Azure Monitor measures a given deployments utilization on 1-minute increments. Provisioned-Managed deployments are optimized to ensure that accepted calls are processed with a consistent model processing time (actual end-to-end latency is dependent on a call's characteristics).
8994

9095
#### What should I do when I receive a 429 response?
9196
The 429 response isn't an error, but instead part of the design for telling users that a given deployment is fully utilized at a point in time. By providing a fast-fail response, you have control over how to handle these situations in a way that best fits your application requirements.
9297

9398
The `retry-after-ms` and `retry-after` headers in the response tell you the time to wait before the next call will be accepted. How you choose to handle this response depends on your application requirements. Here are some considerations:
94-
- You can consider redirecting the traffic to other models, deployments or experiences. This option is the lowest-latency solution because the action can be taken as soon as you receive the 429 signal.
99+
- You can consider redirecting the traffic to other models, deployments or experiences. This option is the lowest-latency solution because the action can be taken as soon as you receive the 429 signal. For ideas on how to effectively implement this pattern see this [community post](https://github.com/Azure/aoai-apim).
95100
- If you're okay with longer per-call latencies, implement client-side retry logic. This option gives you the highest amount of throughput per PTU. The Azure OpenAI client libraries include built-in capabilities for handling retries.
96101

97102
#### How does the service decide when to send a 429?
98-
We use a variation of the leaky bucket algorithm to maintain utilization below 100% while allowing some burstiness in the traffic. The high-level logic is as follows:
103+
104+
In the Provisioned-Managed offering, each request is evaluated individually according to its prompt size, expected generation size, and model to determine its expected utilization. This is in contrast to pay-as-you-go deployments which have a [custom rate limiting behavior](../how-to/quota.md) based on the estimated traffic load. For pay-as-you-go deployments this can lead to HTTP 429s being generated prior to defined quota values being exceeded if traffic is not evenly distributed.
105+
106+
For Provisioned-Managed, we use a variation of the leaky bucket algorithm to maintain utilization below 100% while allowing some burstiness in the traffic. The high-level logic is as follows:
99107
1. Each customer has a set amount of capacity they can utilize on a deployment
100108
2. When a request is made:
101109

@@ -118,7 +126,7 @@ We use a variation of the leaky bucket algorithm to maintain utilization below 1
118126

119127
#### How many concurrent calls can I have on my deployment?
120128

121-
The number of concurrent calls you can achieve depends on each call's shape (prompt size, max_token parameter, etc). The service will continue to accept calls until the utilization reach 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). If the system generates less than the number of samplings tokens like max_token, it will accept more requests.
129+
The number of concurrent calls you can achieve depends on each call's shape (prompt size, max_token parameter, etc.). The service will continue to accept calls until the utilization reach 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). If the system generates less than the number of samplings tokens like max_token, it will accept more requests.
122130

123131
## Next steps
124132

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
33
description: Learn about provisioned throughput units onboarding and Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 02/13/2024
6+
ms.date: 05/02/2024
77
manager: nitinme
88
author: mrbullwinkle
99
ms.author: mbullwin
@@ -17,6 +17,19 @@ This article walks you through the process of onboarding to [Provisioned Through
1717
> [!NOTE]
1818
> Provisioned Throughput Units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
1919
20+
## When to use provisioned throughput units (PTU)
21+
22+
You should consider switching from pay-as-you-go to provisioned throughput when you have well-defined, predictable throughput requirements. Typically, this occurs when the application is ready for production or has already been deployed in production and there is an understanding of the expected traffic. This will allow users to accurately forecast the required capacity and avoid unexpected billing.
23+
24+
### Typical PTU scenarios
25+
26+
- An application that is ready for production or in production.
27+
- Application has predictable capacity/usage expectations.
28+
- Application has real-time/latency sensitive requirements.
29+
30+
> [!NOTE]
31+
> In function calling and agent use cases, token usage can be variable. You should understand your expected Tokens Per Minute (TPM) usage in detail prior to migrating the workloads to PTU.
32+
2033
## Sizing and estimation: provisioned managed only
2134

2235
Determining the right amount of provisioned throughput, or PTUs, you require for your workload is an essential step to optimizing performance and cost. This section describes how to use the Azure OpenAI capacity planning tool. The tool provides you with an estimate of the required PTU to meet the needs of your workload.

articles/azure-arc/servers/plan-evaluate-on-azure-virtual-machine.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ When Azure Arc-enabled servers is configured on the VM, you see two representati
7171
```
7272

7373
3. Block access to the Azure IMDS endpoint.
74+
> [!NOTE]
75+
> The configurations below need to be applied for 169.254.169.254 and 169.254.169.253. These are endpoints used for IMDS in Azure and Azure Stack HCI respectively.
7476
7577
While still connected to the server, run the following commands to block access to the Azure IMDS endpoint. For Windows, run the following PowerShell command:
7678

articles/cloud-shell/pricing.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ Cloud Shell is a free service. You only pay for the underlying Azure resources t
1212

1313
## Compute cost
1414

15-
Azure Cloud Shell runs on a machine provided for free by Azure, but requires an Azure file share to
16-
use.
15+
Azure Cloud Shell runs on a machine provided for free by Azure. If you desire file persistence, Cloud Shell requires a Microsoft Azure Files share.
1716

1817
## Storage cost
1918

articles/communication-services/tutorials/audio-quality-enhancements/add-noise-supression.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to add audio effects in your calls using Azure Communicat
55
author: sloanster
66

77
ms.author: micahvivion
8-
ms.date: 04/16/2024
8+
ms.date: 05/02/2024
99
ms.topic: tutorial
1010
ms.service: azure-communication-services
1111
ms.subservice: calling

articles/communication-services/tutorials/audio-quality-enhancements/includes/web.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: sloanster
66
ms.author: micahvivion
77

88
services: azure-communication-services
9-
ms.date: 04/16/2024
9+
ms.date: 05/02/2024
1010
ms.topic: include
1111
ms.service: azure-communication-services
1212
ms.subservice: calling
@@ -20,14 +20,14 @@ The Azure Communication Services audio effects **noise suppression** abilities c
2020
### Install the npm package
2121
Use the `npm install` command to install the Azure Communication Services Audio Effects SDK for JavaScript.
2222
> [!IMPORTANT]
23-
> This tutorial uses the Azure Communication Services Calling SDK version of `1.24.1-beta.1` (or greater) and the Azure Communication Services Calling Audio Effects SDK version greater than or equal to `1.1.0-beta.1` (or greater).
23+
> This tutorial uses the Azure Communication Services Calling SDK version of **`1.24.2-beta.1`** (or greater) and the Azure Communication Services Calling Audio Effects SDK version greater than or equal to **`1.1.1-beta.1`** (or greater).
2424
```console
25-
npm install @azure/communication-calling-effects --save
25+
@azure/communication-calling-effects@1.1.1-beta.1
2626
```
2727
> [!NOTE]
2828
> The calling effect library cannot be used standalone and can only work when used with the Azure Communication Calling client library for WebJS (https://www.npmjs.com/package/@azure/communication-calling).
2929
30-
You can find more [details ](https://www.npmjs.com/package/@azure/communication-calling-effects) on the calling effects npm package page.
30+
You can find more [details ](https://www.npmjs.com/package/@azure/communication-calling-effects/v/1.1.1-beta.1?activeTab=readme) on the calling effects npm package page.
3131

3232
> [!NOTE]
3333
> Current browser support for adding audio noise suppression effects is only available on Chrome and Edge Desktop Browsers.

articles/cosmos-db/autoscale-per-partition-region.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ ms.service: cosmos-db
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 04/01/2022
12+
ms.date: 05/01/2024
1313
# CustomerIntent: As a database adminstrator, I want to fine tune autoscaler for specific regions or partitions so that I can balance an uneven workload.
1414
---
1515

1616
# Per-region and per-partition autoscale (preview)
1717

18-
By default, Azure Cosmos DB autoscale scales workloads based on the most active region and partition. For nonuniform workloads that have different workload patterns across regions and partitions, this scaling can cause unnecessary scale-ups. With this improvement to autoscale, the per region and per partition autoscale feature now allows your workloads’ regions and partitions to scale independently based on usage.
18+
By default, Azure Cosmos DB autoscale scales workloads based on the most active region and partition. For nonuniform workloads that have different workload patterns across regions and partitions, this scaling can cause unnecessary scale-ups. With this improvement to autoscale, also known as "dynamic scaling," the per region and per partition autoscale feature now allows your workloads’ regions and partitions to scale independently based on usage.
1919

2020
> [!IMPORTANT]
21-
> This feature is only available for Azure Cosmos DB accounts created after **November 15, 2023**.
21+
> By default, this feature is only available for Azure Cosmos DB accounts created after **November 15, 2023**. For customers who can significantly benefit from dynamic scaling, Azure Cosmos DB is progressively enabling the feature in stages for existing accounts and providing GA support, ahead of broader GA. Customers in this cohort will be notified by email before the enablement. This update won’t impact your account(s) performance, availability, and won't cause downtime or data movement. Please contact your Microsoft representative for questions.
2222
2323
This feature is recommended for autoscale workloads that are nonuniform across regions and partitions. This feature allows you to save costs if you often experience hot partitions and/or have multiple regions. When enabled, this feature applies to all autoscale resources in the account.
2424

@@ -65,4 +65,4 @@ Then, use `NormalizedRUConsumption' to see which partitions are scaling indpende
6565

6666
## Requirements/Limitations
6767

68-
Accounts must be created after 11/15/2023 to enable this feature. Support for multi-region write accounts is planned, but not yet supported.
68+
Accounts must be created after 11/15/2023 to enable this feature.

articles/cost-management-billing/manage/change-azure-account-profile.yml

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ metadata:
66
author: bandersmsft
77
ms.author: banders
88
ms.reviewer: judupont
9-
ms.date: 03/04/2023
9+
ms.date: 05/02/2023
1010
ms.service: cost-management-billing
1111
ms.subservice: billing
1212
ms.topic: how-to
@@ -20,16 +20,21 @@ introduction: |
2020
2121
If you want to update your Microsoft Entra user profile information, only a user administrator can make the changes. If you're not assigned the user administrator role, contact your user administrator. For more information about changing a user's profile, see [Add or update a user's profile information using Microsoft Entra ID](../../active-directory/fundamentals/active-directory-users-profile-azure-portal.md).
2222
23-
*Sold-to address* - The sold-to address is the address and the contact information of the organization or the individual, who is responsible for a billing account. It's displayed in all the invoices generated for the billing account.
23+
*Sold-to address* - The sold-to address is the address and the contact information of the organization or the individual, who is responsible for a billing account. In other words, the sold-to information describes the legal entity. It's displayed in all the invoices generated for the billing account.
2424
25-
*Bill-to address* - The bill-to address is the address and the contact information of the organization or the individual, who is responsible for the invoices generated for a billing account. For a billing account for a Microsoft Online Service Program (MOSP), there's one bill-to address, which is displayed on all the invoices generated for the account. For a billing account for a Microsoft Customer Agreement (MCA), there's a bill-to address for each billing profile and it's displayed in the invoice generated for the billing profile.
25+
*Bill-to address* - The bill-to address is the address and the contact information of the organization or the individual, who is responsible for paying the invoices generated for a billing account.
26+
- For a billing account for a Microsoft Online Service Program (MOSP), the bill-to address appears on credit card or invoice information.
27+
- For a billing account for a Microsoft Customer Agreement (MCA), there's a bill-to address for each billing profile and it's displayed in the invoice generated for the billing profile.
2628
2729
*Contact email address for service and marketing emails* - You can specify an email address that's different from the email address that you sign in with to receive important billing, service, and recommendation-related notifications about your Azure account. Service notification emails, such as urgent security issues, price changes, or breaking changes to services in use by your account are always sent to your sign-in address.
2830
procedureSection:
2931
- title: |
30-
Update an MOSP billing account address
32+
Update an MOSP billing account sold-to address
3133
summary: |
32-
Follow these steps:
34+
Use the following information to update your sold-to address.
35+
36+
>[!NOTE]
37+
>If you want to update your credit or debit card information instead, see [Edit credit card details](change-credit-card.md#edit-credit-card-details).
3338
steps:
3439
- |
3540
Sign in to the Azure portal using the email address, which has the account administrator permission on the account.
@@ -40,7 +45,9 @@ procedureSection:
4045
Select **Properties** from the left-hand side.
4146
![Screenshot that shows MOSP billing account properties.](./media/change-azure-account-profile/update-contact-information-select-properties.png)
4247
- |
43-
Select **Update billing address** to update the sold-to and the bill-to addresses. Enter the new address and then select **Save**.
48+
Select **Update sold to** to update the sold-to address.
49+
- |
50+
Enter the new address and then select **Save**.
4451
![Screenshot that shows update address for the MOSP billing account.](./media/change-azure-account-profile/update-contact-information-mosp.png)
4552
- title: |
4653
Update an MCA billing account sold-to address
@@ -132,11 +139,12 @@ procedureSection:
132139
- Best practice recommendations, based on your Azure usage
133140
134141
Enter the email address where you want to receive communications about your account. By entering an email address, you're opting in to receive communications from Microsoft.
142+
135143
![Screenshot example of the prompt to update your contact information.](./media/change-azure-account-profile/update-contact-information.png)
136144
137145
### Change your contact email address
138146
You can change your contact email address by using one of the following methods. Updating your contact email address doesn't update the email address that you sign in with.
139-
1. If you're an account administrator for an MOSP account, follow the instructions in [Update an MOSP billing account address](#update-an-mosp-billing-account-address) and select **Update contact info** in the last step. Next, enter the new email address.
147+
1. If you're an account administrator for an MOSP account, follow the instructions in [Update an MOSP billing account sold-to address](#update-an-mosp-billing-account-sold-to-address) and select **Update contact info** in the last step. Next, enter the new email address.
140148
1. Go to the [Contact information](https://portal.azure.com/#blade/HubsExtension/ContactInfoBlade) area in the Azure portal and enter the new email address.
141149
1. In the Azure portal, select the icon with your initials or picture. Then, select the context menu (**...**). Next, select **My Contact Information** from the menu and enter the new email address.
142150
![Screenshot example of updating an email address in Azure.](./media/change-azure-account-profile/azure-contact-information.png)
18 KB
Loading
Loading

0 commit comments

Comments
 (0)