You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/ai-services/concepts/deployment-types.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,21 @@
2
2
title: Understanding deployment types in Azure AI model inference
3
3
titleSuffix: Azure AI services
4
4
description: Learn how to use deployment types in Azure AI model deployments
5
-
author: mrbullwinkle
6
-
manager: nitinme
5
+
author: sdgilley
6
+
manager: scottpolly
7
7
ms.service: azure-ai-studio
8
8
ms.topic: conceptual
9
-
ms.date: 10/11/2024
9
+
ms.date: 10/24/2024
10
10
ms.author: fasantia
11
+
ms.reviewer: fasantia
11
12
ms.custom: ignite-2024, github-universe-2024
12
13
---
13
14
14
15
# Deployment types in Azure AI model inference
15
16
16
17
Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
17
18
18
-
All deployments can perform the exact same inference operations, however the billing, scale and performance are substantially different. As part of your solution design, you will need to make two key decisions:
19
+
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
19
20
20
21
-**Data residency needs**: global vs. regional resources
21
22
-**Call volume**: standard vs. provisioned
@@ -26,9 +27,9 @@ Deployment types support varies by model and model provider.
26
27
27
28
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
28
29
29
-
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
30
+
Global deployments use Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
30
31
31
-
Our global deployments will be the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
32
+
Our global deployments are the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
32
33
33
34
## Standard
34
35
@@ -40,12 +41,12 @@ Only Azure OpenAI models support this deployment type.
40
41
41
42
## Global standard
42
43
43
-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
44
+
Global deployments are available in the same Azure AI services resources as nonglobal deployment types but allow you to use Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
44
45
45
46
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
46
47
47
48
## Global provisioned
48
49
49
-
Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
50
+
Global deployments are available in the same Azure AI services resources as nonglobal deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
50
51
51
52
Only Azure OpenAI models support this deployment type.
0 commit comments