Skip to content

Commit 3c30fdc

Browse files
authored
Merge pull request #1843 from MicrosoftDocs/main
12/5/2024 AM Publish
2 parents 941b9e5 + 243c764 commit 3c30fdc

File tree

7 files changed

+102
-17
lines changed

7 files changed

+102
-17
lines changed

articles/ai-services/anomaly-detector/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ We have also added links to some user-generated content. Those items will be mar
8484

8585
### April 2021
8686

87-
* [IoT Edge module](https://azuremarketplace.microsoft.com/marketplace/apps/azure-cognitive-service.edge-anomaly-detector) (univariate) published.
87+
* IoT Edge module (univariate) published.
8888
* Anomaly Detector (univariate) available in Microsoft Azure operated by 21Vianet (China East 2).
8989
* Multivariate anomaly detector APIs preview in selected regions (West US 2, West Europe).
9090

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: 'Azure OpenAI Provisioned December 2024 Update'
3+
titleSuffix: Azure OpenAI
4+
description: Learn about new Provisioned skus and commercial changes for Provisioned offers
5+
manager: chrhoder
6+
ms.service: azure-ai-openai
7+
ms.custom:
8+
ms.topic: how-to
9+
ms.date: 11/25/2024
10+
author: sydneemayers
11+
ms.author: sydneemayers
12+
recommendations: false
13+
---
14+
# Azure OpenAI provisioned December 2024 update
15+
16+
In early December, 2024, Microsoft launched several changes to the provisioned offering. These changes include:
17+
- A new deployment type, **data zone provisioned**.
18+
- Updated hourly pricing for global and data zone provisioned deployment types
19+
- New Azure Reservations for global and data zone provisioned deployment types
20+
21+
This article is intended for existing users of the provisioned throughput offering. New customers should refer to the [Azure OpenAI provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md).
22+
23+
## What's changing?
24+
25+
The changes below apply to the global provisioned, data zone provisioned, and provisioned deployment types.
26+
27+
> [!IMPORTANT]
28+
> The changes in this article do not apply to the older *"Provisioned Classic (PTU-C)"* offering. They only affect the Provisioned (also known as the Provisioned Managed) offering.
29+
30+
### Data zone provisioned
31+
Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure within the Microsoft defined data zone. Data zone deployments are supported for gpt-4o and gpt-4o-mini model families.
32+
33+
For more information, see the [deployment types guide](https://aka.ms/aoai/docs/deployment-types).
34+
35+
### New hourly pricing for global and data zone provisioned deployments
36+
In August 2024, Microsoft announced that Provisioned deployments would move to a new [hourly payment model](./provisioned-migration.md) with the option to purchase Azure Reservations to support additional discounts. In December's provisioned update, we will be introducing differentiated hourly pricing across global provisioned, data zone provisioned, and provisioned deployment types. For more information on the hourly price for each provisioned deployment type, see the [Pricing details page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
37+
38+
### New Azure Reservations for global and data zone provisioned deployments
39+
In addition to the updates for the hourly payment model, new Azure Reservations will be introduced specifically for global and data zone provisioned deployment types. With these new Azure Reservations, every provisioned deployment type will have a separate Azure Reservation that can be purchased to support additional discounts. The mapping between each provisioned deployment type and the associated Azure Reservation are as follows:
40+
41+
| Provisioned deployment type | Sku name in code | Azure Reservation product name |
42+
|---|---|---|
43+
| Global provisioned | `GlobalProvisionedManaged` | Provisioned Managed Global |
44+
| Data zone provisioned | `DataZoneProvisionedManaged` | Provisioned Managed Data Zone |
45+
| Provisioned | `ProvisionedManaged` | Provisioned Managed Regional |
46+
47+
> [!IMPORTANT]
48+
> Azure Reservations for Azure OpenAI provisioned offers are not interchangeable across deployment types. The Azure Reservation purchased must match the provisioned deployment type. If the Azure Reservation purchased does not match the provisioned deployment type, the provisioned deployment will default to the hourly payment model until a matching Azure Reservation product is purchased. For more information, see the [Azure Reservations for Azure OpenAI Service provisioned guidance](https://aka.ms/oai/docs/ptum-reservations).
49+
50+
## Migrating existing deployments to global or data zone provisioned
51+
Existing customers of provisioned deployments can choose to migrate to global or data zone provisioned deployments to benefit from the lower deployment minimums, granular scale increments, or differentiated pricing available for these deployment types. To learn more about how global and data zone provisioned deployments handle data processing across Azure geographies, see the Azure OpenAI deployment [data processing documentation](https://aka.ms/aoai/docs/data-processing-locations).
52+
53+
Two approaches are available for customers to migrate from provisioned deployments to global or data zone provisioned deployments.
54+
55+
### Zero downtime migration
56+
The zero downtime migration approach allows customers to migrate their existing provisioned deployments to global or data zone provisioned deployments without interrupting the existing inference traffic on their deployment. This migration approach minimizes workload interruptions, but does require a customer to have multiple coexisting deployments while shifting traffic over. The process to migrate a provisioned deployment using the zero downtime migration approach is as follows:
57+
- Create a new deployment using the global or data zone provisioned deployment types in the target Azure OpenAI resource.
58+
- Transition traffic from the existing regional provisioned deployment type to the newly created global or data zone provisioned deployment until all traffic is offloaded from the existing regional provisioned deployment.
59+
- Once traffic is migrated over to the new deployment, validate that there are no inference requests being processed on the previous provisioned deployment by ensuring the Azure OpenAI Requests metric does not show any API calls made within 5-10 minutes of the inference traffic being migrated over to the new deployment. For more information on this metric, [see the Monitor Azure OpenAI documentation](https://aka.ms/aoai/docs/monitor-azure-openai).
60+
- Once you confirm that no inference calls have been made, delete the regional provisioned deployment.
61+
62+
### Migration with downtime
63+
The migration with downtime approach involves migrating existing provisioned deployments to global or data zone provisioned deployments while stopping any existing inference traffic on the original provisioned deployment. This migration approach does not require coexistence of multiple deployments to support but does require workload interruption to complete. The process to migrate a provisioned deployment using the migration with downtime approach is as follows:
64+
- Validate that there are no inference requests being processed on the previous provisioned deployment by ensuring the Azure OpenAI Requests metric does not show any API calls made within the last 5-10 minutes. For more information on this metric, [see the Monitor Azure OpenAI documentation](https://aka.ms/aoai/docs/monitor-azure-openai).
65+
- Once you confirm that no inference calls have been made, delete the regional provisioned deployment.
66+
- Create a new deployment using the global or data zone deployment types in the target Azure OpenAI resource.
67+
- Once your new deployment has succeeded, you may resume inference traffic on the new global or data zone deployment.
68+
69+
## How do I migrate my existing Azure Reservation to the new Azure Reservation products?
70+
Azure Reservations for Azure OpenAI Service provisioned offers are specific to the provisioned deployment type. If the Azure Reservation purchased does not match the provisioned deployment type, the deployment will default to the hourly payment model. If you choose to migrate to global or data zone provisioned deployments, you may need to purchase a new Azure Reservation for these deployments to support additional discounts. For more information on how to purchase a new Azure Reservation or make changes to an existing Azure Reservation, see the [Azure Reservations for Azure OpenAI Service Provisioned guidance](https://aka.ms/aoai/reservation-transition).
71+

articles/ai-services/openai/how-to/embeddings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ print(response.model_dump_json(indent=2))
5656
import openai
5757

5858
openai.api_type = "azure"
59-
openai.api_key = YOUR_API_KEY
59+
openai.api_key = "YOUR_API_KEY"
6060
openai.api_base = "https://YOUR_RESOURCE_NAME.openai.azure.com"
6161
openai.api_version = "2024-06-01"
6262

articles/ai-services/openai/how-to/provisioned-get-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,5 +244,5 @@ We recommend the following workflow:
244244
* [Python reference documentation](https://github.com/openai/openai-python?tab=readme-ov-file#retries)
245245
* [.NET reference documentation](/dotnet/api/overview/azure/ai.openai-readme)
246246
* [Java reference documentation](/java/api/com.azure.ai.openai.openaiclientbuilder?view=azure-java-preview&preserve-view=true#com-azure-ai-openai-openaiclientbuilder-retryoptions(com-azure-core-http-policy-retryoptions))
247-
* [JavaScript reference documentation](/javascript/api/@azure/openai/openaiclientoptions?view=azure-node-preview&preserve-view=true#@azure-openai-openaiclientoptions-retryoptions)
247+
* [JavaScript reference documentation](/azure/ai-services/openai/supported-languages?tabs=dotnet-secure%2Csecure%2Cpython-secure%2Ccommand&pivots=programming-language-javascript)
248248
* [GO reference documentation](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai#ChatCompletionsOptions)

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,8 @@ items:
9191
displayName: PTU, provisioned, provisioned throughput units
9292
- name: Azure OpenAI PTU update
9393
href: ./concepts/provisioned-migration.md
94+
- name: Azure OpenAI PTU reservation update
95+
href: ./concepts/provisioned-reservation-update.md
9496
- name: Legacy models
9597
href: ./concepts/legacy-models.md
9698
- name: How-to

articles/machine-learning/tutorial-create-secure-workspace-vnet.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,9 @@ Use the following steps to create an Azure Virtual Machine to use as a jump box.
402402
1. Once the virtual machine is created, select __Go to resource__.
403403
1. From the top of the page, select __Connect__ and then __Connect via Bastion__.
404404

405+
> [!TIP]
406+
> Azure Bastion uses port 443 for inbound communication. If you have a firewall that restricts outbound traffic, ensure that it allows traffic on port 443 to the Azure Bastion service. For more information, see [Wroking with NSGs and Azure Bastion](/azure/bastion/bastion-nsg).
407+
405408
:::image type="content" source="./media/tutorial-create-secure-workspace-vnet/virtual-machine-connect.png" alt-text="Screenshot of the 'connect' list, with 'Bastion' selected.":::
406409

407410
1. Provide your authentication information for the virtual machine, and a connection is established in your browser.

articles/search/search-limits-quotas-capacity.md

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: azure-ai-search
1010
ms.topic: conceptual
11-
ms.date: 10/28/2024
11+
ms.date: 12/05/2024
1212
ms.custom:
1313
- references_regions
1414
- build-2024
@@ -200,11 +200,11 @@ Static rate request limits for operations related to a service:
200200

201201
### Semantic Ranker Throttling limits
202202

203-
[Semantic ranker](search-get-started-semantic.md) uses a queuing system to manage concurrent requests. This sytem allows search services get the highest amount of queries per second possible. When the limit of concurrent requests is reached, additional requests are placed in a queue. If the queue is full, further requests are rejected and must be retried.
203+
[Semantic ranker](search-get-started-semantic.md) uses a queuing system to manage concurrent requests. This system allows search services get the highest number of queries per second possible. When the limit of concurrent requests is reached, additional requests are placed in a queue. If the queue is full, further requests are rejected and must be retried.
204204

205205
Total semantic ranker queries per second varies based on the following factors:
206206
+ The SKU of the search service. Both queue capacity and concurrent request limits vary by SKU.
207-
+ The number of search units in the search service. The simplest way to increase the maximum amount of concurrent semantic ranker queries is to [add additional search units to your search service](search-capacity-planning.md#how-to-change-capacity).
207+
+ The number of search units in the search service. The simplest way to increase the maximum number of concurrent semantic ranker queries is to [add additional search units to your search service](search-capacity-planning.md#how-to-change-capacity).
208208
+ The total available semantic ranker capacity in the region.
209209
+ The amount of time it takes to serve a query using semantic ranker. This varies based on how busy the search service is.
210210

@@ -217,21 +217,30 @@ The following table describes the semantic ranker throttling limits by SKU. Subj
217217

218218
## API request limits
219219

220+
Limits on payloads and queries exist because unbounded queries can destabilize your search service. Typically, such queries are created programmatically. If your application generates search queries programmatically, we recommend designing it in such a way that it doesn't generate queries of unbounded size. If you must exeed a supported limit, you should [test your workload](search-performance-analysis.md#develop-baseline-numbers) so that you know what to expect.
221+
220222
Except where noted, the following API requests apply to all programmable interfaces, including the Azure SDKs.
221223

222-
+ Maximum of 16 MB per indexing or query request when pushing a payload to the search service <sup>1</sup>
223-
+ Maximum 8-KB URL length (applies to REST APIs only)
224-
+ Maximum 1,000 documents per batch of index uploads, merges, or deletes
225-
+ Maximum 32 fields in $orderby clause
226-
+ Maximum 100,000 characters in a search clause
227-
+ The maximum number of clauses in `search` (expressions separated by AND or OR) is 1024
228-
+ Maximum search term size is 32,766 bytes (32 KB minus 2 bytes) of UTF-8 encoded text
229-
+ Maximum search term size is 1,000 characters for [prefix search](query-simple-syntax.md#prefix-queries) and [regex search](query-lucene-syntax.md#bkmk_regex)
230-
+ [Wildcard search](query-lucene-syntax.md#bkmk_wildcard) and [Regular expression search](query-lucene-syntax.md#bkmk_regex) are limited to a maximum of 1,000 states when processed by [Lucene](https://lucene.apache.org/core/7_0_1/core/org/apache/lucene/util/automaton/RegExp.html).
224+
General:
225+
226+
+ Supported maximum payload limit is 16 MB for indexing and query requests via REST API and SDKs.
227+
+ Maximum 8-KB URL length (applies to REST APIs only).
228+
229+
Indexing APIs:
230+
231+
+ Supported maximum 1,000 documents per batch of index uploads, merges, or deletes.
232+
233+
Query APIs:
234+
235+
+ Maximum 32 fields in $orderby clause.
236+
+ Maximum 100,000 characters in a search clause.
237+
+ Maximum number of clauses in search is 3,000.
238+
+ Maximum limits on [wildcard](query-lucene-syntax.md#bkmk_wildcard) and [regular expression](query-lucene-syntax.md#bkmk_regex) queries, as enforced by [Lucene](https://lucene.apache.org/core/7_0_1/core/org/apache/lucene/util/automaton/RegExp.html). It caps the number of patterns, variations, or matches to 1,000 instances. This limit is in place to avoid engine overload.
231239

232-
<sup>1</sup> In Azure AI Search, the body of a request is subject to an upper limit of 16 MB, imposing a practical limit on the contents of individual fields or collections that aren't otherwise constrained by theoretical limits (see [Supported data types](/rest/api/searchservice/supported-data-types) for more information about field composition and restrictions).
240+
Search terms:
233241

234-
Limits on query size and composition exist because unbounded queries can destabilize your search service. Typically, such queries are created programmatically. If your application generates search queries programmatically, we recommend designing it in such a way that it doesn't generate queries of unbounded size.
242+
+ Supported maximum search term size is 32,766 bytes (32 KB minus 2 bytes) of UTF-8 encoded text. Applies to keyword search and the text property of vector search.
243+
+ Supported maximum search term size is 1,000 characters for [prefix search](query-simple-syntax.md#prefix-queries) and [regex search](query-lucene-syntax.md#bkmk_regex).
235244

236245
## API response limits
237246

0 commit comments

Comments
 (0)