Skip to content

Commit 62437aa

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into sap-reduce-fp
2 parents 178b3bb + 5651a07 commit 62437aa

File tree

456 files changed

+8007
-4854
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

456 files changed

+8007
-4854
lines changed

.openpublishing.publish.config.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1228,6 +1228,7 @@
12281228
"articles/ai-services/.openpublishing.redirection.applied-ai-old.json",
12291229
"articles/ai-services/.openpublishing.redirection.applied-ai-services.json",
12301230
"articles/ai-services/.openpublishing.redirection.cognitive-services.json",
1231+
"articles/ai-studio/.openpublishing.redirection.ai-studio.json",
12311232
"articles/energy-data-services/.openpublishing.redirection.energy-data-services.json",
12321233
"articles/azure-fluid-relay/.openpublishing.redirection.fluid-relay.json",
12331234
"articles/azure-netapp-files/.openpublishing.redirection.azure-netapp-files.json",
@@ -1249,6 +1250,7 @@
12491250
"articles/event-grid/.openpublishing.redirection.event-grid.json",
12501251
"articles/event-hubs/.openpublishing.redirection.event-hubs.json",
12511252
"articles/governance/policy/.openpublishing.redirection.policy.json",
1253+
"articles/governance/policy/.openpublishing.redirection.resource-graph.json",
12521254
"articles/hdinsight/.openpublishing.redirection.hdinsight.json",
12531255
"articles/hdinsight-aks/.openpublishing.redirection.hdinsight-aks.json",
12541256
"articles/healthcare-apis/.openpublishing.redirection.healthcare-apis.json",

.openpublishing.redirection.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
{
22
"redirections": [
3+
{
4+
"source_path": "articles/storage/common/storage-analytics-metrics.md",
5+
"redirect_url": "/previous-versions/azure/storage/common/storage-analytics-metrics",
6+
"redirect_document_id": false
7+
},
8+
{
9+
"source_path": "articles/storage/common/manage-storage-analytics-metrics.md",
10+
"redirect_url": "/previous-versions/azure/storage/common/manage-storage-analytics-metrics",
11+
"redirect_document_id": false
12+
},
313
{
414
"source_path": "articles/azure-arc/vmware-vsphere/switch-to-new-preview-version.md",
515
"redirect_url": "/azure/azure-arc/vmware-vsphere/switch-to-new-version-vmware",

.openpublishing.redirection.sentinel.json

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -215,11 +215,6 @@
215215
"redirect_url": "/azure/sentinel/data-connectors/rubrik-security-cloud-data-connector-using-azure-functions",
216216
"redirect_document_id": true
217217
},
218-
{
219-
"source_path": "articles/sentinel/data-connectors/cisco-asa-ftd-via-ama.md",
220-
"redirect_url": "/azure/sentinel/data-connectors-reference",
221-
"redirect_document_id": false
222-
},
223218
{
224219
"source_path": "articles/sentinel/data-connectors/okta-single-sign-on-using-azure-function.md",
225220
"redirect_url": "/azure/sentinel/data-connectors/okta-single-sign-on-using-azure-functions",
@@ -484,6 +479,11 @@
484479
"source_path": "articles/sentinel/data-connectors/cyberpion-security-logs.md",
485480
"redirect_url": "/azure/sentinel/data-connectors-reference",
486481
"redirect_document_id": false
487-
}
482+
},
483+
{
484+
"source_path": "articles/sentinel/data-connectors/azure-active-directory-identity-protection.md",
485+
"redirect_url": "/azure/sentinel/data-connectors/microsoft-entra-id-protection",
486+
"redirect_document_id": true
487+
}
488488
]
489489
}

articles/ai-services/content-safety/includes/severity-levels.md

Lines changed: 18 additions & 18 deletions
Large diffs are not rendered by default.

articles/ai-services/openai/concepts/content-filter.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -796,6 +796,10 @@ data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_
796796
797797
data: [DONE]
798798
```
799+
800+
> [!IMPORTANT]
801+
> When content filtering is triggered for a prompt and a `"status": 400` is received as part of the response there may be a charge for this request as the prompt was evaluated by the service. [Charges will also occur](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) when a `"status":200` is received with `"finish_reason": "content_filter"`. In this case the prompt did not have any issues, but the completion generated by the model was detected to violate the content filtering rules which results in the completion being filtered.
802+
799803
## Best practices
800804
801805
As part of your application design, consider the following best practices to deliver a positive experience with your application while minimizing potential harms:

articles/ai-services/openai/concepts/models.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
101101
**<sup>2</sup>** GPT-4 Turbo with Vision Preview = `gpt-4` (vision-preview). To deploy this model, under **Deployments** select model **gpt-4**. For **Model version** select **vision-preview**.
102102

103103
> [!CAUTION]
104-
> We don't recommend using these models in production. We will upgrade all deployments of these models to a future stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
104+
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to a future stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
105105
106106
> [!NOTE]
107107
> Regions where GPT-4 (0314) & (0613) are listed as available have access to both the 8K and 32K versions of the model
@@ -110,8 +110,8 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
110110

111111
| Model Availability | gpt-4 (0314) | gpt-4 (0613) | gpt-4 (1106-preview) | gpt-4 (vision-preview) |
112112
|---|:---|:---|:---|:---|
113-
| Available to all subscriptions with Azure OpenAI access | | Australia East <br> Canada East <br> France Central <br> Sweden Central <br> Switzerland North | Australia East <br> Canada East <br> East US 2 <br> France Central <br> Norway East <br> South India <br> Sweden Central <br> UK South <br> West US | Switzerland North <br> West US |
114-
| Available to subscriptions with current access to the model version in the region | East US <br> France Central <br> South Central US <br> UK South | East US <br> East US 2 <br> Japan East <br> UK South | | Australia East <br>Sweden Central|
113+
| Available to all subscriptions with Azure OpenAI access | | Australia East <br> Canada East <br> France Central <br> Sweden Central <br> Switzerland North | Australia East <br> Canada East <br> East US 2 <br> France Central <br> Norway East <br> South India <br> Sweden Central <br> UK South <br> West US | Sweden Central <br> Switzerland North <br> West US |
114+
| Available to subscriptions with current access to the model version in the region | East US <br> France Central <br> South Central US <br> UK South | East US <br> East US 2 <br> Japan East <br> UK South | | Australia East |
115115

116116
### GPT-3.5 models
117117

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 69 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure OpenAI Service provisioned throughput
33
description: Learn about provisioned throughput and Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 11/20/2023
6+
ms.date: 1/16/2024
77
ms.custom:
88
manager: nitinme
99
author: mrbullwinkle #ChrisHMSFT
@@ -14,44 +14,96 @@ keywords:
1414

1515
# What is provisioned throughput?
1616

17-
The provisioned throughput capability allows you to specify the amount of throughput you require for your application. The service then provisions the necessary compute and ensures it is ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing an amount of throughput for your deployment. Each model-versions pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
17+
The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
1818

1919
## What does the provisioned deployment type provide?
2020

21-
- **Predictable performance:** stable max latency and throughput for uniform workloads.
21+
- **Predictable performance:** stable max latency and throughput for uniform workloads.
2222
- **Reserved processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
23-
- **Cost savings:** High throughput workloads may provide cost savings vs token-based consumption.
23+
- **Cost savings:** High throughput workloads might provide cost savings vs token-based consumption.
2424

25-
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates additional features like Content Moderation ([See content moderation documentation](content-filter.md)).
25+
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates more features like Content Moderation ([See content moderation documentation](content-filter.md)).
2626

2727
> [!NOTE]
28-
> Provisioned throughput units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
28+
> Provisioned throughput unit (PTU) quota is different from standard quota in Azure OpenAI and is not available by default. To learn more about this offering contact your Microsoft Account Team.
2929
3030
## What do you get?
3131

32-
|Topic | Provisioned|
32+
| Topic | Provisioned|
3333
|---|---|
34-
| What is it? | Provides guaranteed throughput at smaller increments than the existing provisioned offer. Deployments will have a consistent max latency for a given model-version |
34+
| What is it? | Provides guaranteed throughput at smaller increments than the existing provisioned offer. Deployments have a consistent max latency for a given model-version. |
3535
| Who is it for? | Customers who want guaranteed throughput with minimal latency variance. |
36-
| Quota | Provisioned-managed throughput Units |
37-
| Latency | Max latency constrained |
38-
| Utilization | Provisioned-managed Utilization measure provided in Azure Monitor |
39-
| Estimating size | Provided calculator in the studio & load test script |
36+
| Quota | Provisioned-managed throughput Units for a given model. |
37+
| Latency | Max latency constrained from the model. Overall latency is a factor of call shape. |
38+
| Utilization | Provisioned-managed Utilization measure provided in Azure Monitor. |
39+
| Estimating size | Provided calculator in the studio & benchmarking script. |
4040

4141
## Key concepts
4242

4343
### Provisioned throughput units
4444

45-
Provisioned throughput Units (PTU) are units of model processing capacity that customers you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
45+
Provisioned throughput units (PTU) are units of model processing capacity that customers you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
4646

4747
### Deployment types
4848

49-
We introduced a new deployment type called **ProvisionedManaged** which provides smaller increments of PTU per deployment. Both types have their own quota, and you will only see the options you have been enabled for.
49+
When deploying a model in Azure OpenAI, you need to set the `sku-name` to be Provisioned-Managed. The `sku-capacity` specifies the number of PTUs assigned to the deployment.
50+
51+
```azurecli
52+
az cognitiveservices account deployment create \
53+
--name <myResourceName> \
54+
--resource-group <myResourceGroupName> \
55+
--deployment-name MyDeployment \
56+
--model-name GPT-4 \
57+
--model-version 0613 \
58+
--model-format OpenAI \
59+
--sku-capacity 100 \
60+
--sku-name ProvisionedManaged
61+
```
5062

5163
### Quota
5264

53-
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level meaning that it can be consumed by different resources within that subscription.
65+
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level. All Azure OpenAI resources within the subscription share this quota.
66+
67+
Quota is specific to a (deployment type, model, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. You can raise a support request to move quota across deployment types, models, or regions but the swap isn't guaranteed.
68+
69+
While we make every attempt to ensure that quota is deployable, quota doesn't represent a guarantee that the underlying capacity is available. The service assigns capacity during the deployment operation and if capacity is unavailable the deployment fails with an out of capacity error.
70+
71+
72+
### How utilization enforcement works
73+
Provisioned deployments provide you with an allocated amount of model processing capacity to run a given model. The `Provisioned-Managed Utilization` metric in Azure Monitor measures a given deployments utilization on 1-minute increments. Provisioned-Managed deployments are optimized to ensure that accepted calls are processed with a consistent model processing time (actual end-to-end latency is dependent on a call's characteristics). When the workload exceeds the allocated PTU capacity, the service returns a 429 HTTP status code until the utilization drops down below 100%.
74+
75+
76+
#### What should I do when I receive a 429 response?
77+
The 429 response isn't an error, but instead part of the design for telling users that a given deployment is fully utilized at a point in time. By providing a fast-fail response, you have control over how to handle these situations in a way that best fits your application requirements.
78+
79+
The `retry-after-ms` and `retry-after` headers in the response tell you the time to wait before the next call will be accepted. How you choose to handle this response depends on your application requirements. Here are some considerations:
80+
- You can consider redirecting the traffic to other models, deployments or experiences. This option is the lowest-latency solution because the action can be taken as soon as you receive the 429 signal.
81+
- If you're okay with longer per-call latencies, implement client-side retry logic. This option gives you the highest amount of throughput per PTU. The Azure OpenAI client libraries include built-in capabilities for handling retries.
82+
83+
#### How does the service decide when to send a 429?
84+
We use a variation of the leaky bucket algorithm to maintain utilization below 100% while allowing some burstiness in the traffic. The high-level logic is as follows:
85+
1. Each customer has a set amount of capacity they can utilize on a deployment
86+
2. When a request is made:
87+
88+
a. When the current utilization is above 100%, the service returns a 429 code with the `retry-after-ms` header set to the time until utilization is below 100%
89+
90+
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining prompt tokens and the specified max_tokens in the call.
91+
92+
3. When a request finishes, we now know the actual compute cost for the call. To ensure an accurate accounting, we correct the utilization using the following logic:
93+
94+
a. If the actual > estimated, then the difference is added to the deployment's utilization
95+
b. If the actual < estimated, then the difference is subtracted.
96+
97+
4. The overall utilization is decremented down at a continuous rate based on the number of PTUs deployed.
98+
99+
Since calls are accepted until utilization reaches 100%, you're allowed to burst over 100% utilization when first increasing traffic. For sizeable calls and small sized deployments, you might then be over 100% utilization for up to several minutes.
100+
101+
102+
:::image type="content" source="../media/provisioned/utilization.jpg" alt-text="Diagram showing how subsequent calls are added to the utilization." lightbox="../media/provisioned/utilization.jpg":::
103+
104+
54105

55-
Quota is specific to a (deployment type, model, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. Customers can raise a support request to move the quota across deployment types, models, or regions but we can't guarantee that it will be possible.
106+
## Next steps
56107

57-
While we make every attempt to ensure that quota is always deployable, quota does not represent a guarantee that the underlying capacity is available for the customer to use. The service assigns capacity to the customer at deployment time and if capacity is unavailable the deployment will fail with an out of capacity error.
108+
- [Learn about the onboarding steps for provisioned deployments](../how-to/provisioned-throughput-onboarding.md)
109+
- [Provisioned Throughput Units (PTU) getting started guide](../how-to//provisioned-get-started.md)

0 commit comments

Comments
 (0)