Skip to content

Commit 943b1a5

Browse files
committed
updating links
1 parent ed2a02f commit 943b1a5

File tree

4 files changed

+19
-35
lines changed

4 files changed

+19
-35
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 4 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,17 @@ title: Azure OpenAI Service provisioned throughput
33
description: Learn about provisioned throughput and Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 08/07/2024
6+
ms.date: 03/31/2025
77
manager: nitinme
8-
author: mrbullwinkle #ChrisHMSFT
9-
ms.author: mbullwin #chrhoder
8+
author: aahill #ChrisHMSFT
9+
ms.author: aahi #chrhoder
1010
recommendations: false
1111
---
1212

1313
# What is provisioned throughput?
1414

1515
> [!NOTE]
16-
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommended that customers onboarded before this date read the Azure [OpenAI provisioned August update](./provisioned-migration.md) to learn more about these changes.
17-
16+
> If you're looking for what's recently changed with the provisioned throughput offering, see the [update article](./provisioned-migration.md) for more information.
1817
1918
The provisioned throughput offering is a model deployment type that allows you to specify the amount of throughput you require in a model deployment. The Azure OpenAI service then allocates the necessary model processing capacity and ensures it's ready for you. Provisioned throughput provides:
2019

@@ -36,21 +35,6 @@ An Azure OpenAI deployment is a unit of management for a specific OpenAI Model.
3635

3736
You should consider switching from standard deployments to provisioned managed deployments when you have well-defined, predictable throughput and latency requirements. Typically, this occurs when the application is ready for production or has already been deployed in production and there's an understanding of the expected traffic. This allows users to accurately forecast the required capacity and avoid unexpected billing. Provisioned managed deployments are also useful for applications that have real-time/latency sensitive requirements.
3837

39-
<!--
40-
## What do you get?
41-
42-
43-
| Topic | Description |
44-
|---|---|
45-
| What is it? |Provides guaranteed throughput at smaller increments than the existing provisioned offer. Deployments have a consistent max latency for a given model-version. |
46-
| Who is it for? | Customers who want guaranteed throughput with minimal latency variance. |
47-
| Quota |Provisioned Managed Throughput Unit, Global Provisioned Managed Throughput Unit, or Data Zone Provisioned Managed Throughput Unit assigned per region. Quota can be used across any available Azure OpenAI model.|
48-
| Latency | Max latency constrained from the model. Overall latency is a factor of call shape. |
49-
| Utilization | Provisioned-managed Utilization V2 measure provided in Azure Monitor. |
50-
|Estimating size |Provided sizing calculator in Azure AI Foundry.|
51-
|Prompt caching | For supported models, we discount up to 100% of cached input tokens. |
52-
-->
53-
5438
## Key concepts
5539

5640
### Provisioned Throughput Units (PTU)

articles/ai-services/openai/how-to/fine-tuning-deploy.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-openai
88
ms.custom: build-2023, build-2023-dataai, devx-track-python, references_regions
99
ms.topic: how-to
10-
ms.date: 02/24/2025
10+
ms.date: 03/31/2025
1111
author: mrbullwinkle
1212
ms.author: mbullwin
1313
---
@@ -390,7 +390,7 @@ Global Standard fine-tuned deployments currently support structured outputs only
390390
- `gpt-4o-mini-2024-07-18`
391391
- `gpt-4o-2024-08-06`
392392

393-
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md#what-do-the-provisioned-deployment-types-provide) for fine-tuned deployments. As part of public preview, provisioned managed deployments may be created regionally via the data-plane [REST API](../reference.md#data-plane-inference) version `2024-10-01` or newer. See below for examples.
393+
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md) for fine-tuned deployments. As part of public preview, provisioned managed deployments may be created regionally via the data-plane [REST API](../reference.md#data-plane-inference) version `2024-10-01` or newer. See below for examples.
394394

395395
Provisioned Managed fine-tuned deployments currently support structured outputs only on GPT-4o.
396396

@@ -424,7 +424,7 @@ curl -X PUT "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceG
424424

425425
#### Scaling a fine-tuned model on Provisioned Managed
426426

427-
To scale a fine-tuned provision managed deployment to increase or decrease PTU capacity, perform the same `PUT` REST API call as you did when [creating the deployment](#creating-a-provisioned-managed-deployment) and provide an updated `capacity` value for the `sku`. Keep in mind, provisioned deployments must scale in [minimum increments](../concepts/provisioned-throughput.md#how-much-throughput-per-ptu-you-get-for-each-model).
427+
To scale a fine-tuned provision managed deployment to increase or decrease PTU capacity, perform the same `PUT` REST API call as you did when [creating the deployment](#creating-a-provisioned-managed-deployment) and provide an updated `capacity` value for the `sku`. Keep in mind, provisioned deployments must scale in [minimum increments](../how-to/provisioned-throughput-onboarding.md#how-much-throughput-per-ptu-you-get-for-each-model).
428428

429429
For example, to scale the model deployed in the previous section from 25 to 40 PTU, make another `PUT` call and increase the capacity:
430430

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
3-
description: Learn about provisioned throughput units onboarding and Azure OpenAI.
2+
title: Understanding costs associated with provisioned throughput units (PTU)
3+
description: Learn about provisioned throughput costs and billing in Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 10/18/2024
6+
ms.date: 03/31/2025
77
manager: nitinme
8-
author: mrbullwinkle
9-
ms.author: mbullwin
8+
author: aahill
9+
ms.author: aahi
1010
recommendations: false
1111
---
1212

@@ -21,14 +21,14 @@ Use this article to learn about calculating and understanding costs associated w
2121

2222
Provisioned throughput units (PTUs) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota. Each quota is specific to a region and defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
2323

24-
## Understanding the provisioned throughput purchase model
24+
## Understanding provisioned throughput billing
2525

26-
Azure OpenAI [Provisioned](../how-to/deployment-types.md#provisioned), [Data Zone Provisioned](../how-to/deployment-types.md#data-zone-provisioned), and [Global Provisioned](../how-to/deployment-types.md#global-provisioned) are purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of [Azure Reservations](#azure-reservations-for-azure-openai-provisioned-deployments).
26+
Azure OpenAI [Provisioned](../how-to/deployment-types.md#provisioned), [Data Zone Provisioned](../how-to/deployment-types.md#data-zone-provisioned) (also known as regional), and [Global Provisioned](../how-to/deployment-types.md#global-provisioned) are purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of [Azure Reservations](#azure-reservations-for-azure-openai-provisioned-deployments).
2727

2828
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon.  However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned, Data Zone Provisioned, and Global Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
2929

3030
> [!NOTE]
31-
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers can continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers or new models introduced after August 2024. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../concepts/provisioned-migration.md).
31+
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers can continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers or [certain new models](../concepts/provisioned-migration.md#supported-models-on-commitment-payment-model) introduced after August 2024. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../concepts/provisioned-migration.md).
3232
3333

3434
## Model independent quota
@@ -49,7 +49,7 @@ Quota for provisioned deployments shows up in Azure AI Foundry as the following
4949

5050

5151
> [!NOTE]
52-
> Global provisioned and data zone provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. For more information on model availability, review the [models documentation](./models.md).
52+
> Global provisioned and data zone provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. For more information on model availability, review the [models documentation](../concepts/models.md).
5353
5454
## Hourly usage
5555

@@ -108,7 +108,7 @@ PTU quota is available by default in many regions. If more quota is required, cu
108108

109109
### Per-Model PTU minimums
110110

111-
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
111+
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version. See the above [table](#how-much-throughput-per-ptu-you-get-for-each-model) for more information.
112112

113113
## Estimate provisioned throughput units and cost
114114

articles/ai-services/openai/how-to/working-with-models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
44
description: Learn about managing model deployment life cycle, updates, & retirement.
55
ms.service: azure-ai-openai
66
ms.topic: conceptual
7-
ms.date: 08/29/2024
7+
ms.date: 03/31/2025
88
ms.custom: references_regions, build-2023, build-2023-dataai, devx-track-azurepowershell
99
manager: nitinme
1010
author: mrbullwinkle #ChrisHMSFT
@@ -289,7 +289,7 @@ Provisioned deployments support distinct model management practices. Provisioned
289289
### Prerequisites
290290
- Validate that the target model version or model family is supported for your existing deployment type. Migrations can only occur between provisioned deployments of the same deployment type. For more information on deployment types, review the [deployment type documentation](./deployment-types.md).
291291
- Validate capacity availability for your target model version or model family prior to attempting a migration. For more information on determining capacity availability, review the [capacity transparency documentation](../concepts/provisioned-throughput.md#capacity-transparency).
292-
- For multi-deployment migrations, validate that you have sufficient quota to support multiple deployments simultaneously. For more information on how to validate quota for each provisioned deployment type, review the [provisioned quota documentation](../concepts/provisioned-throughput.md#quota).
292+
- For multi-deployment migrations, validate that you have sufficient quota to support multiple deployments simultaneously. For more information on how to validate quota for each provisioned deployment type, review the [provisioned throughput cost documentation](../how-to/provisioned-throughput-onboarding.md).
293293

294294
### In-place migrations for provisioned deployments
295295
In-place migrations allow you to maintain the same provisioned deployment name and size while changing the model version or model family assigned to that deployment. With in-place migrations, Azure OpenAI Service takes care of migrating any existing traffic between model versions or model families throughout the migration over a 20-30 minute window. Throughout the migration window, your provisioned deployment will display an "updating" provisioned state. You can continue to use your provisioned deployment as you normally would. Once the in-place migration is complete, the provisioned state will be updated to "succeeded", indicating that all traffic has been migrated over to the target model version or model family.

0 commit comments

Comments
 (0)