Skip to content

Commit 18926cf

Browse files
authored
Merge pull request #1 from sdgilley/sdg-ai-studio-gh
acrolinx fixes
2 parents 94b90e3 + a30a4f2 commit 18926cf

File tree

7 files changed

+74
-65
lines changed

7 files changed

+74
-65
lines changed

articles/ai-studio/ai-services/concepts/deployment-types.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,21 @@
22
title: Understanding deployment types in Azure AI model inference
33
titleSuffix: Azure AI services
44
description: Learn how to use deployment types in Azure AI model deployments
5-
author: mrbullwinkle
6-
manager: nitinme
5+
author: sdgilley
6+
manager: scottpolly
77
ms.service: azure-ai-studio
88
ms.topic: conceptual
9-
ms.date: 10/11/2024
9+
ms.date: 10/24/2024
1010
ms.author: fasantia
11+
ms.reviewer: fasantia
1112
ms.custom: ignite-2024, github-universe-2024
1213
---
1314

1415
# Deployment types in Azure AI model inference
1516

1617
Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
1718

18-
All deployments can perform the exact same inference operations, however the billing, scale and performance are substantially different. As part of your solution design, you will need to make two key decisions:
19+
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
1920

2021
- **Data residency needs**: global vs. regional resources
2122
- **Call volume**: standard vs. provisioned
@@ -26,9 +27,9 @@ Deployment types support varies by model and model provider.
2627

2728
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
2829

29-
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
30+
Global deployments use Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
3031

31-
Our global deployments will be the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
32+
Our global deployments are the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
3233

3334
## Standard
3435

@@ -40,12 +41,12 @@ Only Azure OpenAI models support this deployment type.
4041

4142
## Global standard
4243

43-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
44+
Global deployments are available in the same Azure AI services resources as nonglobal deployment types but allow you to use Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
4445

4546
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
4647

4748
## Global provisioned
4849

49-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
50+
Global deployments are available in the same Azure AI services resources as nonglobal deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
5051

5152
Only Azure OpenAI models support this deployment type.

articles/ai-studio/ai-services/concepts/endpoints.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
title: Use the Azure AI model inference endpoint
33
titleSuffix: Azure AI studio
44
description: Learn about to use the Azure AI model inference endpoint and how to configure it.
5-
author: mrbullwinkle
6-
manager: nitinme
75
ms.service: azure-ai-studio
86
ms.topic: conceptual
9-
ms.date: 10/11/2024
10-
ms.author: fasantia
7+
author: sdgilley
8+
manager: scottpolly
9+
ms.date: 10/24/2024
10+
ms.author: sgilley
11+
ms.reviewer: fasantia
1112
ms.custom: ignite-2024, github-universe-2024
1213
---
1314

articles/ai-studio/ai-services/concepts/quotas-limits.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22
title: Azure AI model inference quotas and limits
33
titleSuffix: Azure AI services
44
description: Quick reference, detailed description, and best practices on the quotas and limits for the Azure AI models service in Azure AI services.
5-
author: santiagxf
6-
manager: nitinme
75
ms.service: azure-ai-studio
86
ms.custom: ignite-2024, github-universe-2024
97
ms.topic: conceptual
108
ms.date: 10/21/2024
11-
ms.author: fasantia
9+
author: sdgilley
10+
manager: scottpolly
11+
ms.date: 10/24/2024
12+
ms.author: sgilley
13+
ms.reviewer: fasantia
1214
---
1315

1416
# Azure AI model inference quotas and limits

0 commit comments

Comments
 (0)