Skip to content

Commit 517c563

Browse files
authored
Merge pull request #259111 from MicrosoftDocs/main
11/20/2023 PM Publish
2 parents 35651a6 + fb9d9a8 commit 517c563

File tree

72 files changed

+643
-741
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+643
-741
lines changed

.openpublishing.redirection.azure-kubernetes-service.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
{
22
"redirections": [
3+
{
4+
"source_path_from_root": "/articles/aks/managed-azure-ad.md",
5+
"redirect_url": "/azure/aks/enable-authentication-microsoft-entra-id.md",
6+
"redirect_document_id": false
7+
},
38
{
49
"source_path_from_root": "/articles/aks/stop-api-upgrade.md",
510
"redirect_url": "/azure/aks/upgrade-cluster",

.openpublishing.redirection.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10232,6 +10232,11 @@
1023210232
"redirect_url": "/azure/vpn-gateway/add-remove-site-to-site-connections",
1023310233
"redirect_document_id": false
1023410234
},
10235+
{
10236+
"source_path_from_root": "/articles/vpn-gateway/tutorial-protect-vpn-gateway.md",
10237+
"redirect_url": "/azure/vpn-gateway/tutorial-create-gateway-portal",
10238+
"redirect_document_id": false
10239+
},
1023510240
{
1023610241
"source_path_from_root": "/articles/vpn-gateway/vpn-gateway-howto-openvpn-clients.md",
1023710242
"redirect_url": "/azure/vpn-gateway/point-to-site-vpn-client-cert-windows",
@@ -10537,6 +10542,11 @@
1053710542
"redirect_url": "/azure/bastion/tutorial-create-host-portal",
1053810543
"redirect_document_id": false
1053910544
},
10545+
{
10546+
"source_path_from_root": "/articles/bastion/tutorial-protect-bastion-host-ddos.md",
10547+
"redirect_url": "/azure/bastion/tutorial-create-host-portal",
10548+
"redirect_document_id": false
10549+
},
1054010550
{
1054110551
"source_path_from_root": "/articles/bastion/bastion-connect-vm-rdp.md",
1054210552
"redirect_url": "/azure/bastion/bastion-connect-vm-rdp-windows",
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: Azure OpenAI Service provisioned throughput
3+
description: Learn about provisioned throughput and Azure OpenAI.
4+
ms.service: azure-ai-openai
5+
ms.topic: conceptual
6+
ms.date: 11/20/2023
7+
ms.custom:
8+
manager: nitinme
9+
author: mrbullwinkle #ChrisHMSFT
10+
ms.author: mbullwin #chrhoder
11+
recommendations: false
12+
keywords:
13+
---
14+
15+
# What is provisioned throughput?
16+
17+
The provisioned throughput capability allows you to specify the amount of throughput you require for your application. The service then provisions the necessary compute and ensures it is ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing an amount of throughput for your deployment. Each model-versions pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
18+
19+
## What does the provisioned deployment type provide?
20+
21+
- **Predictable performance:** stable max latency and throughput for uniform workloads.
22+
- **Reserved processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
23+
- **Cost savings:** High throughput workloads will result in cost savings vs token-based consumption.
24+
25+
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates additional features like Content Moderation ([See content moderation documentation](content-filter.md)).
26+
27+
> [!NOTE]
28+
> Provisioned throughput units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
29+
30+
## What do you get?
31+
32+
|Topic | Provisioned|
33+
|---|---|
34+
| What is it? | Provides guaranteed throughput at smaller increments than the existing provisioned offer. Deployments will have a consistent max latency for a given model-version |
35+
| Who is it for? | Customers who want guaranteed throughput with minimal latency variance. |
36+
| Quota | Provisioned-managed throughput Units |
37+
| Latency | Max latency constrained |
38+
| Utilization | Provisioned-managed Utilization measure provided in Azure Monitor |
39+
| Estimating size | Provided calculator in the studio & load test script |
40+
41+
## Key concepts
42+
43+
### Provisioned throughput units
44+
45+
Provisioned throughput Units (PTU) are units of model processing capacity that customers you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
46+
47+
### Deployment types
48+
49+
We introduced a new deployment type called **ProvisionedManaged** which provides smaller increments of PTU per deployment. Both types have their own quota, and you will only see the options you have been enabled for.
50+
51+
### Quota
52+
53+
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level meaning that it can be consumed by different resources within that subscription.
54+
55+
Quota is specific to a (deployment type, mode, region) triplet and isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. Customers can raise a support request to move the quota across deployment types, models, or regions but we can't guarantee that it will be possible.
56+
57+
While we make every attempt to ensure that quota is always deployable, quota does not represent a guarantee that the underlying capacity is available for the customer to use. The service assigns capacity to the customer at deployment time and if capacity is unavailable the deployment will fail with an out of capacity error.

articles/ai-services/openai/includes/chatgpt-dotnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 11/15/2023
1212
keywords:
1313
---
1414

15-
[Source code](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/src) | [Package (NuGet)](https://www.nuget.org/packages/Azure.AI.OpenAI/) | [Samples](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/tests/Samples)| [Enterprise chat app template](/dotnet/azure/ai/get-started-app-chat-template) |
15+
[Source code](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/src) | [Package (NuGet)](https://www.nuget.org/packages/Azure.AI.OpenAI/) | [Samples](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/tests/Samples)| [Retrieval Augmented Generation (RAG) enterprise chat template](/dotnet/azure/ai/get-started-app-chat-template) |
1616

1717
## Prerequisites
1818

articles/ai-services/openai/includes/chatgpt-java.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 07/26/2023
1212
keywords:
1313
---
1414

15-
[Source code](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai) | [Artifact (Maven)](https://central.sonatype.com/artifact/com.azure/azure-ai-openai/1.0.0-beta.3) | [Samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai/src/samples) | [Enterprise chat app template](/azure/developer/java/quickstarts/get-started-app-chat-template) |
15+
[Source code](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai) | [Artifact (Maven)](https://central.sonatype.com/artifact/com.azure/azure-ai-openai/1.0.0-beta.3) | [Samples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai/src/samples) | [Retrieval Augmented Generation (RAG) enterprise chat template](/azure/developer/java/quickstarts/get-started-app-chat-template) |
1616

1717
## Prerequisites
1818

articles/ai-services/openai/includes/chatgpt-javascript.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 07/26/2023
1212
keywords:
1313
---
1414

15-
[Source code](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/openai/openai) | [Package (npm)](https://www.npmjs.com/package/@azure/openai) | [Samples](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/tests/Samples) | [Enterprise chat app template](/azure/developer/javascript/get-started-app-chat-template)|
15+
[Source code](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/openai/openai) | [Package (npm)](https://www.npmjs.com/package/@azure/openai) | [Samples](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/tests/Samples) | [Retrieval Augmented Generation (RAG) enterprise chat template](/azure/developer/javascript/get-started-app-chat-template)|
1616

1717
## Prerequisites
1818

articles/ai-services/openai/includes/chatgpt-python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 11/15/2023
1212
keywords:
1313
---
1414

15-
[Library source code](https://github.com/openai/openai-python?azure-portal=true) | [Package (PyPi)](https://pypi.org/project/openai?azure-portal=true) | [Enterprise chat app template](/azure/developer/python/get-started-app-chat-template) |
15+
[Library source code](https://github.com/openai/openai-python?azure-portal=true) | [Package (PyPi)](https://pypi.org/project/openai?azure-portal=true) | [Retrieval Augmented Generation (RAG) enterprise chat template](/azure/developer/python/get-started-app-chat-template) |
1616

1717
## Prerequisites
1818

articles/ai-services/openai/toc.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ items:
5151
href: ./concepts/model-versions.md
5252
- name: Prompt engineering techniques
5353
href: ./concepts/advanced-prompt-engineering.md
54+
- name: Provisioned throughput units (PTU)
55+
href: ./concepts/provisioned-throughput.md
5456
- name: System message templates
5557
href: ./concepts/system-message.md
5658
- name: Using your data (preview)
@@ -159,6 +161,23 @@ items:
159161
href: /rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2023-10-01-preview&preserve-view=true
160162
- name: REST API (resource creation & deployment)
161163
href: /rest/api/cognitiveservices/accountmanagement/deployments/create-or-update?tabs=HTTP
164+
- name: Templates
165+
items:
166+
- name: Retrieval Augmented Generation (RAG) enterprise chat
167+
displayName: RAG, rag
168+
items:
169+
- name: C#
170+
href: /dotnet/azure/ai/get-started-app-chat-template
171+
displayName: RAG, rag
172+
- name: Java
173+
href: /azure/developer/java/quickstarts/get-started-app-chat-template
174+
displayName: RAG, rag
175+
- name: JavaScript
176+
href: /azure/developer/javascript/get-started-app-chat-template
177+
displayName: RAG, rag
178+
- name: Python
179+
href: /azure/developer/python/get-started-app-chat-template
180+
displayName: RAG, rag
162181
- name: Resources
163182
items:
164183
- name: Support and help options

articles/aks/azure-ad-integration-cli.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -336,5 +336,5 @@ For best practices on identity and resource control, see [Best practices for aut
336336
[operator-best-practices-identity]: operator-best-practices-identity.md
337337
[azure-ad-rbac]: azure-ad-rbac.md
338338
[managed-aad]: managed-azure-ad.md
339-
[managed-aad-migrate]: managed-azure-ad.md#upgrade-a-legacy-azure-ad-cluster-to-aks-managed-azure-ad-integration
339+
[managed-aad-migrate]: managed-azure-ad.md#migrate-a-legacy-azure-ad-cluster-to-integration
340340
[az-aks-show]: /cli/azure/aks#az_aks_show

0 commit comments

Comments
 (0)