Skip to content

Commit 39603f7

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into WI196311-export-to-SIEM
2 parents 6e7aed6 + bd7e6d2 commit 39603f7

File tree

381 files changed

+5904
-4371
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

381 files changed

+5904
-4371
lines changed

.openpublishing.redirection.json

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12053,6 +12053,11 @@
1205312053
"redirect_url": "/azure/governance/management-groups/overview",
1205412054
"redirect_document_id": false
1205512055
},
12056+
{
12057+
"source_path_from_root": "/articles/governance/azure-management.md",
12058+
"redirect_url": "/azure/governance/management-groups/azure-management",
12059+
"redirect_document_id": false
12060+
},
1205612061
{
1205712062
"source_path_from_root": "/articles/resource-manager-policy.md",
1205812063
"redirect_url": "/azure/governance/policy/overview",
@@ -12833,6 +12838,17 @@
1283312838
"redirect_url": "/azure/load-balancer/load-balancer-multiple-ip-powershell",
1283412839
"redirect_document_id": false
1283512840
},
12841+
{
12842+
"source_path": "articles/load-balancer/configure-vm-scale-set-cli.md",
12843+
"redirect_url": "/azure/load-balancer/configure-vm-scale-set-portal",
12844+
"redirect_document_id": false
12845+
},
12846+
{
12847+
"source_path": "articles/load-balancer/configure-vm-scale-set-powershell.md",
12848+
"redirect_url": "/azure/load-balancer/configure-vm-scale-set-portal",
12849+
"redirect_document_id": true
12850+
},
12851+
1283612852
{
1283712853
"source_path_from_root": "/articles/dms/tutorial-sql-server-azure-sql-online.md",
1283812854
"redirect_url": "/azure/dms/tutorial-sql-server-to-azure-sql",
@@ -22460,6 +22476,51 @@
2246022476
"source_path_from_root": "/articles/reliability/disaster-recovery-guidance-overview.md",
2246122477
"redirect_url": "/azure/reliability/reliability-guidance-overview",
2246222478
"redirect_document_id": false
22479+
},
22480+
{
22481+
"source_path_from_root": "/articles/virtual-network/move-across-regions-nsg-portal.md",
22482+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22483+
"redirect_document_id": false
22484+
},
22485+
{
22486+
"source_path_from_root": "/articles/virtual-network/move-across-regions-nsg-powershell.md",
22487+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22488+
"redirect_document_id": false
22489+
},
22490+
{
22491+
"source_path_from_root": "/articles/virtual-network/move-across-regions-publicip-portal.md",
22492+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22493+
"redirect_document_id": false
22494+
},
22495+
{
22496+
"source_path_from_root": "/articles/virtual-network/move-across-regions-publicip-powershell.md",
22497+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22498+
"redirect_document_id": false
22499+
},
22500+
{
22501+
"source_path_from_root": "/articles/virtual-network/move-across-regions-vnet-powershell.md",
22502+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22503+
"redirect_document_id": false
22504+
},
22505+
{
22506+
"source_path_from_root": "/articles/virtual-network/move-across-regions-vnet-portal.md",
22507+
"redirect_url": "/azure/resource-mover/move-region-within-resource-group",
22508+
"redirect_document_id": false
22509+
},
22510+
{
22511+
"source_path_from_root": "/articles/virtual-network/scripts/virtual-network-powershell-sample-multi-tier-application.md",
22512+
"redirect_url": "/azure/app-service/tutorial-secure-ntier-app",
22513+
"redirect_document_id": false
22514+
},
22515+
{
22516+
"source_path_from_root": "/articles/virtual-network/ip-services/virtual-networks-static-private-ip-classic-pportal.md",
22517+
"redirect_url": "/updates/azure-classic-resource-providers-will-be-retired-on-31-august-2024/",
22518+
"redirect_document_id": false
22519+
},
22520+
{
22521+
"source_path_from_root": "/articles/virtual-network/ip-services/virtual-networks-static-private-ip-classic-ps.md",
22522+
"redirect_url": "/updates/azure-classic-resource-providers-will-be-retired-on-31-august-2024/",
22523+
"redirect_document_id": false
2246322524
}
2246422525
]
2246522526
}

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,6 @@ We introduced a new deployment type called **ProvisionedManaged** which provides
5252

5353
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level meaning that it can be consumed by different resources within that subscription.
5454

55-
Quota is specific to a (deployment type, mode, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. Customers can raise a support request to move the quota across deployment types, models, or regions but we can't guarantee that it will be possible.
55+
Quota is specific to a (deployment type, model, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. Customers can raise a support request to move the quota across deployment types, models, or regions but we can't guarantee that it will be possible.
5656

5757
While we make every attempt to ensure that quota is always deployable, quota does not represent a guarantee that the underlying capacity is available for the customer to use. The service assigns capacity to the customer at deployment time and if capacity is unavailable the deployment will fail with an out of capacity error.
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: 'Quickstart - Get started using Provisioned Deployments with Azure OpenAI Service'
3+
titleSuffix: Azure OpenAI Service
4+
description: Walkthrough on how to get started provisioned deployments on Azure OpenAI Service.
5+
manager: nitinme
6+
ms.service: azure-ai-openai
7+
ms.custom: openai
8+
ms.topic: how-to
9+
author: ChrisHMSFT
10+
ms.author: chrhoder
11+
ms.date: 12/15/2023
12+
recommendations: false
13+
---
14+
15+
# Get started using Provisioned Deployments on the Azure OpenAI Service
16+
17+
The following guide walks you through setting up a provisioned deployment with your Azure OpenAI Service resource.
18+
19+
## Prerequisites
20+
21+
- An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services?azure-portal=true)
22+
- Access granted to Azure OpenAI in the desired Azure subscription.
23+
Currently, access to this service is by application. You can apply for access to Azure OpenAI Service by completing the form at [https://aka.ms/oai/access](https://aka.ms/oai/access?azure-portal=true).
24+
- Obtained Quota for a provisioned deployment and purchased a commitment.
25+
26+
> [!NOTE]
27+
> Provisioned Throughput Units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
28+
29+
30+
## Create your provisioned deployment
31+
32+
After you purchase a commitment on your quota, you can create a deployment. To create a provisioned deployment, you can follow these steps; the choices described reflect the entries shown in the screenshot.
33+
34+
:::image type="content" source="../media/provisioned/deployment-screen.jpg" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.jpg":::
35+
36+
37+
38+
1. Sign into the [Azure OpenAI Studio](https://oai.azure.com)
39+
2. Choose the subscription that was enabled for provisioned deployments & select the desired resource in a region where you have the quota.
40+
3. Under **Management** in the left-nav select **Deployments**.
41+
4. Select Create new deployment and configure the following fields. Expand the ‘advanced options’ drop-down.
42+
5. Fill out the values in each field. Here's an example:
43+
44+
| Field | Description | Example |
45+
|--|--|--|
46+
| Select a model| Choose the specific model you wish to deploy. | GPT-4 |
47+
| Model version | Choose the version of the model to deploy. | 0613 |
48+
| Deployment Name | The deployment name is used in your code to call the model by using the client libraries and the REST APIs. | gpt-4|
49+
| Content filter | Specify the filtering policy to apply to the deployment. Learn more on our [Content Filtering](../concepts/content-filter.md) how-tow | Default |
50+
| Deployment Type |This impacts the throughput and performance. Choose Provisioned-Managed for your provisioned deployment | Provisioned-Managed |
51+
| Provisioned Throughput Units | Choose the amount of throughput you wish to include in the deployment. | 100 |
52+
53+
54+
If you wish to create your deployment programmatically, you can do so with the following Azure CLI command. Update the `sku-capacity` with the desired number of provisioned throughput units.
55+
56+
```cli
57+
az cognitiveservices account deployment create \
58+
--name <myResourceName> \
59+
--resource-group <myResourceGroupName> \
60+
--deployment-name MyModel \
61+
--model-name GPT-4 \
62+
--model-version 0613 \
63+
--model-format OpenAI \
64+
--sku-capacity 100 \
65+
--sku-name Provisioned-Managed
66+
```
67+
68+
REST, ARM template, Bicep and Terraform can also be used to create deployments. See the section on automating deployments in the [Managing Quota](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota?tabs=rest#automate-deployment) how-to guide and replace the `sku.name` with "Provisioned-Managed" rather than "Standard."
69+
70+
## Make your first calls
71+
The inferencing code for provisioned deployments is the same a standard deployment type. The following code snippet shows a chat completions call to a GPT-4 model. For your first time using these models programmatically, we recommend starting with our [quickstart start guide](../quickstart.md). Our recommendation is to use the OpenAI library with version 1.0 or greater since this includes retry logic within the library.
72+
73+
74+
```python
75+
#Note: The openai-python library support for Azure OpenAI is in preview.
76+
import os
77+
from openai import AzureOpenAI
78+
79+
client = AzureOpenAI(
80+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
81+
api_key=os.getenv("AZURE_OPENAI_KEY"),
82+
api_version="2023-05-15"
83+
)
84+
85+
response = client.chat.completions.create(
86+
model="gpt-4", # model = "deployment_name".
87+
messages=[
88+
{"role": "system", "content": "You are a helpful assistant."},
89+
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
90+
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
91+
{"role": "user", "content": "Do other Azure AI services support this too?"}
92+
]
93+
)
94+
95+
print(response.choices[0].message.content)
96+
```
97+
98+
> [!IMPORTANT]
99+
> For production, use a secure way of storing and accessing your credentials like [Azure Key Vault](../../../key-vault/general/overview.md). For more information about credential security, see the Azure AI services [security](../../security-features.md) article.
100+
101+
102+
## Understanding expected throughput
103+
The amount of throughput that you can achieve on the endpoint is a factor of the number of PTUs deployed, input size, output size and call rate. The number of concurrent calls and total tokens processed can vary based on these values. Our recommended way for determining the throughput for your deployment is as follows:
104+
1. Use the Capacity calculator for a sizing estimate. You can find the capacity calculator in the Azure OpenAI Studio under the quotas page and Provisioned tab.
105+
2. Benchmark the load using real traffic workload. For more information about benchmarking, see the [benchmarking](#run-a-benchmark) section.
106+
107+
108+
## Measuring your deployment utilization
109+
When you deploy a specified number of provisioned throughput units (PTUs), a set amount of inference throughput is made available to that endpoint. Utilization of this throughput is a complex formula based on the model, model-version call rate, prompt size, generation size. To simplify this calculation, we provide a utilization metric in Azure Monitor. Your deployment returns a 429 on any new calls after the utilization rises above 100%. The Provisioned utilization is defined as follows:
110+
111+
PTU deployment utilization = (PTUs consumed in the time period) / (PTUs deployed in the time period)
112+
113+
You can find the utilization measure in the Azure-Monitor section for your resource. To access the monitoring dashboards sign-in to [https://portal.azure.com](https://portal.azure.com), go to your Azure OpenAI resource and select the Metrics page from the left nav. On the metrics page, select the 'Provisioned-managed utilization' measure. If you have more than one deployment in the resource, you should also split the values by each deployment by clicking the 'Apply Splitting' button.
114+
115+
:::image type="content" source="../media/provisioned/azure-monitor-utilization.jpg" alt-text="Screenshot of the provisioned managed utilization on the resource's metrics blade in the Azure portal." lightbox="../media/provisioned/azure-monitor-utilization.jpg":::
116+
117+
For more information about monitoring your deployments, see the [Monitoring Azure OpenAI Service](./monitoring.md) page.
118+
119+
120+
## Handling high utilization
121+
Provisioned deployments provide you with an allocated amount of compute capacity to run a given model. The ‘Provisioned-Managed Utilization’ metric in Azure Monitor measures the utilization of the deployment in one-minute increments. Provisioned-Managed deployments are also optimized so that calls accepted are processed with a consistent per-call max latency. When the workload exceeds its allocated capacity, the service returns a 429 HTTP status code until the utilization drops down below 100%. The time before retrying is provided in the `retry-after` and `retry-after-ms` response headers that provide the time in seconds and milliseconds respectively. This approach maintains the per-call latency targets while giving the developer control over how to handle high-load situations – for example retry or divert to another experience/endpoint.
122+
123+
### What should I do when I receive a 429 response?
124+
A 429 response indicates that the allocated PTUs are fully consumed at the time of the call. The response includes the `retry-after-ms` and `retry-after` headers that tell you the time to wait before the next call will be accepted. How you choose to handle a 429 response depends on your application requirements. Here are some considerations:
125+
- If you are okay with longer per-call latencies, implement client-side retry logic to wait the `retry-after-ms` time and retry. This approach lets you maximize the throughput on the deployment. Microsoft-supplied client SDKs already handle it with reasonable defaults. You might still need further tuning based on your use-cases.
126+
- Consider redirecting the traffic to other models, deployments or experiences. This approach is the lowest-latency solution because this action can be taken as soon as you receive the 429 signal.
127+
The 429 signal isn't an unexpected error response when pushing to high utilization but instead part of the design for managing queuing and high load for provisioned deployments.
128+
129+
### Modifying retry logic within the client libraries
130+
The Azure OpenAI SDKs retry 429 responses by default and behind the scenes in the client (up to the maximum retries). The libraries respect the `retry-after` time. You can also modify the retry behavior to better suite your experience. Here's an example with the python library.
131+
132+
133+
You can use the `max_retries` option to configure or disable retry settings:
134+
135+
```python
136+
from openai import AzureOpenAI
137+
138+
# Configure the default for all requests:
139+
client = AzureOpenAI(
140+
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
141+
api_key=os.getenv("AZURE_OPENAI_KEY"),
142+
api_version="2023-05-15",
143+
max_retries=5,# default is 2
144+
)
145+
146+
# Or, configure per-request:
147+
client.with_options(max_retries=5).chat.completions.create(
148+
model="gpt-4", # model = "deployment_name".
149+
messages=[
150+
{"role": "system", "content": "You are a helpful assistant."},
151+
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
152+
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
153+
{"role": "user", "content": "Do other Azure AI services support this too?"}
154+
]
155+
)
156+
```
157+
158+
159+
## Run a benchmark
160+
The exact performance and throughput capabilities of your instance depends on the kind of requests you make and the exact workload. The best way to determine the throughput for your workload is to run a benchmark on your own data.
161+
162+
To assist you in this work, the benchmarking tool provides a way to easily run benchmarks on your deployment. The tool comes with several possible preconfigured workload shapes and outputs key performance metrics. Learn more about the tool and configuration settings in our GitHub Repo: [https://aka.ms/aoai/benchmarking](https://aka.ms/aoai/benchmarking).
163+
164+
We recommend the following workflow:
165+
1. Estimate your throughput PTUs using the capacity calculator.
166+
1. Run a benchmark with this traffic shape for an extended period of time (10+ min) to observe the results in a steady state.
167+
1. Observe the utilization, tokens processed and call rate values from benchmark tool and Azure Monitor.
168+
1. Run a benchmark with your own traffic shape and workloads using your client implementation. Be sure to implement retry logic using either an Azure Openai client library or custom logic.
169+
170+
171+
172+
## Next Steps
173+
174+
* For more information on cloud application best practices, check out [Best practices in cloud applications](https://learn.microsoft.com/azure/architecture/best-practices/index-best-practices)
175+
* For more information on provisioned deployments, check out [What is provisioned throughput?](../concepts/provisioned-throughput.md)
176+
* For more information on retry logic within each SDK, check out:
177+
* [Python reference documentation](https://github.com/openai/openai-python?tab=readme-ov-file#retries)
178+
* [.NET reference documentation](https://learn.microsoft.com/dotnet/api/azure.ai.openai.openaiclientoptions?view=azure-dotnet-preview)
179+
* [Java reference documentation](https://learn.microsoft.com/java/api/com.azure.ai.openai.openaiclientbuilder?view=azure-java-preview#com-azure-ai-openai-openaiclientbuilder-retryoptions(com-azure-core-http-policy-retryoptions))
180+
* [JavaScript reference documentation](https://learn.microsoft.com/javascript/api/@azure/openai/openaiclientoptions?view=azure-node-preview#@azure-openai-openaiclientoptions-retryoptions)
181+
* [GO reference documentation](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai#ChatCompletionsOptions)
282 KB
Loading
90.1 KB
Loading

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,8 @@ items:
128128
href: ./how-to/quota.md
129129
- name: Monitor Azure OpenAI
130130
href: ./how-to/monitoring.md
131+
- name: Get started with Provisioned Deployments
132+
href: ./how-to/provisioned-get-started.md
131133
- name: Plan and manage costs
132134
href: ./how-to/manage-costs.md
133135
- name: Performance & latency

articles/ai-services/speech-service/batch-transcription-audio-data.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@ The batch transcription API supports a number of different formats and codecs, s
3131
- WAV
3232
- MP3
3333
- OPUS/OGG
34-
- AAC
3534
- FLAC
3635
- WMA
3736
- ALAW in WAV container

articles/ai-services/speech-service/how-to-custom-speech-display-text-format.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 12/14/2023
9+
ms.date: 1/10/2024
1010
ms.author: eur
1111
---
1212

@@ -156,7 +156,7 @@ Here are the grammar punctuation rules:
156156

157157
#### Spelling correction
158158

159-
The name `CVOID-19` might be recognized as `covered 19`. To make sure that `COVID-19 is a virus` is displayed instead of `covered 19 is a virus`, use the following rewrite rule:
159+
The name `COVID-19` might be recognized as `covered 19`. To make sure that `COVID-19 is a virus` is displayed instead of `covered 19 is a virus`, use the following rewrite rule:
160160

161161
```text
162162
#rewrite

articles/ai-services/speech-service/includes/previews/preview-personal-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
ms.author: eric-urban
66
ms.service: azure-ai-services
77
ms.topic: include
8-
ms.date: 12/1/2023
8+
ms.date: 1/10/2024
99
ms.custom: include
1010
---
1111

0 commit comments

Comments
 (0)