You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/deployment-types.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
33
33
|**Best suited for**| Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
34
34
|**How it works**| Traffic may be routed anywhere in the world |||
|**Cost**|[Baseline](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
36
+
|**Cost**|[Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
37
37
|**What you get**| Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost)|
|**Per-call Latency**| Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/model-catalog-overview.md
+9-15Lines changed: 9 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,12 +47,12 @@ Some models in the **Curated by Azure AI** and **Open models from the Hugging Fa
47
47
Model Catalog offers two distinct ways to deploy models from the catalog for your use: managed compute and serverless APIs. The deployment options available for each model vary; learn more about the features of the deployment options, and the options available for specific models, in the following tables. Learn more about [data processing](concept-data-privacy.md) with the deployment options.
48
48
<!-- docutune:disable -->
49
49
50
-
Features | Managed compute | serverless API (pay-as-you-go)
50
+
Features | Managed compute | Serverless API (pay-as-you-go)
51
51
--|--|--
52
-
Deployment experience and billing | Model weights are deployed to dedicated Virtual Machines with Managed Online Endpoints. The managed online endpoint, which can have one or more deployments, makes available a REST API for inference. You're billed for the Virtual Machine core hours used by the deployments. | Access to models is through a deployment that provisions an API to access the model. The API provides access to the model hosted and managed by Microsoft, for inference. This mode of access is referred to as "Models as a Service". You're billed for inputs and outputs to the APIs, typically in tokens; pricing information is provided before you deploy.
52
+
Deployment experience and billing | Model weights are deployed to dedicated Virtual Machines with Managed Online Endpoints. The managed online endpoint, which can have one or more deployments, makes available a REST API for inference. You're billed for the Virtual Machine core hours used by the deployments. | Access to models is through a deployment that provisions an API to access the model. The API provides access to the model hosted and managed by Microsoft, for inference. You're billed for inputs and outputs to the APIs, typically in tokens; pricing information is provided before you deploy.
53
53
| API authentication | Keys and Microsoft Entra ID authentication.| Keys only.
54
-
Content safety | Use Azure Content Safety service APIs. | Azure AI Content Safety filters are available integrated with inference APIs. Azure AI Content Safety filters may be billed separately.
55
-
Network isolation | [Configure managed networks for Azure AI Studio hubs.](configure-managed-network.md) | MaaS endpoint will follow your hub's public network access (PNA) flag setting. For more information, see the [Network isolation for models deployed via Serverless APIs](#network-isolation-for-models-deployed-via-serverless-apis) section.
54
+
Content safety | Use Azure Content Safety service APIs. | Azure AI Content Safety filters are available integrated with inference APIs. Azure AI Content Safety filters is billed separately.
55
+
Network isolation | [Configure managed networks for Azure AI Studio hubs.](configure-managed-network.md) | Endpoints will follow your hub's public network access (PNA) flag setting. For more information, see the [Network isolation for models deployed via Serverless APIs](#network-isolation-for-models-deployed-via-serverless-apis) section.
56
56
57
57
Model | Managed compute | Serverless API (pay-as-you-go)
58
58
--|--|--
@@ -100,25 +100,19 @@ Prompt flow offers a great experience for prototyping. You can use models deploy
100
100
101
101
## Serverless APIs with Pay-as-you-go billing
102
102
103
-
Certain models in the Model Catalog can be deployed as serverless APIs with pay-as-you-go billing; this method of deployment is called Models-as-a Service (MaaS), providing a way to consume them as an API without hosting them on your subscription. Models available through MaaS are hosted in infrastructure managed by Microsoft, which enables API-based access to the model provider's model. API based access can dramatically reduce the cost of accessing a model and significantly simplify the provisioning experience. Most MaaS models come with token-based pricing.
103
+
Certain models in the Model Catalog can be deployed as serverless APIs with pay-as-you-go billing, providing a way to consume them as an API without hosting them on your subscription. Models are hosted in infrastructure managed by Microsoft, which enables API-based access to the model provider's model. API based access can dramatically reduce the cost of accessing a model and significantly simplify the provisioning experience.
104
104
105
-
### How are third-party models made available in MaaS?
105
+
Models that are available for deployment as serverless APIs with pay-as-you-go billing are offered by the model provider but hosted in Microsoft-managed Azure infrastructure and accessed via API. Model providers define the license terms and set the price for use of their models, while Azure Machine Learning service manages the hosting infrastructure, makes the inference APIs available, and acts as the data processor for prompts submitted and content output by models deployed via MaaS. Learn more about data processing for MaaS at the [data privacy](concept-data-privacy.md) article.
106
106
107
107
:::image type="content" source="../media/explore/model-publisher-cycle.png" alt-text="A diagram showing model publisher service cycle." lightbox="../media/explore/model-publisher-cycle.png":::
108
108
109
-
Models that are available for deployment as serverless APIs with pay-as-you-go billing are offered by the model provider but hosted in Microsoft-managed Azure infrastructure and accessed via API. Model providers define the license terms and set the price for use of their models, while Azure Machine Learning service manages the hosting infrastructure, makes the inference APIs available, and acts as the data processor for prompts submitted and content output by models deployed via MaaS. Learn more about data processing for MaaS at the [data privacy](concept-data-privacy.md) article.
110
-
111
-
### Pay for model usage in MaaS
109
+
### Billing
112
110
113
111
The discovery, subscription, and consumption experience for models deployed via MaaS is in the Azure AI Studio and Azure Machine Learning studio. Users accept license terms for use of the models, and pricing information for consumption is provided during deployment. Models from third party providers are billed through Azure Marketplace, in accordance with the [Commercial Marketplace Terms of Use](/legal/marketplace/marketplace-terms); models from Microsoft are billed using Azure meters as First Party Consumption Services. As described in the [Product Terms](https://www.microsoft.com/licensing/terms/welcome/welcomepage), First Party Consumption Services are purchased using Azure meters but aren't subject to Azure service terms; use of these models is subject to the license terms provided.
114
112
115
-
### Deploy models for inference through MaaS
116
-
117
-
Deploying a model through MaaS allows users to get access to ready to use inference APIs without the need to configure infrastructure or provision GPUs, saving engineering time and resources. These APIs can be integrated with several LLM tools and usage is billed as described in the previous section.
118
-
119
-
### Fine-tune models through MaaS with Pay-as-you-go
113
+
### Fine-tune models
120
114
121
-
For models that are available through MaaS and support fine-tuning, users can take advantage of hosted fine-tuning with pay-as-you-go billing to tailor the models using data they provide. For more information, see the [fine-tuning overview](../concepts/fine-tuning-overview.md).
115
+
Certain models support also serverless fine-tuning where users can take advantage of hosted fine-tuning with pay-as-you-go billing to tailor the models using data they provide. For more information, see the [fine-tuning overview](../concepts/fine-tuning-overview.md).
Copy file name to clipboardExpand all lines: articles/aks/dapr-overview.md
+9-6Lines changed: 9 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
-
title: Dapr extension for Azure Kubernetes Service (AKS) overview
2
+
title: Dapr extension for Azure Kubernetes Service (AKS) and Arc-enabled Kubernetes
3
3
description: Learn more about using Dapr on your Azure Kubernetes Service (AKS) cluster to develop applications.
4
4
ms.author: nickoman
5
5
ms.topic: overview
6
6
ms.date: 04/22/2024
7
7
---
8
8
9
-
# Dapr
9
+
# Dapr extension for Azure Kubernetes Service (AKS) and Arc-enabled Kubernetes
10
10
11
11
[Distributed Application Runtime (Dapr)][dapr-docs] offers APIs that help you write and implement simple, portable, resilient, and secured microservices. Dapr APIs run as a sidecar process in tandem with your applications and abstract away common complexities you may encounter when building distributed applications, such as:
12
12
- Service discovery
@@ -19,7 +19,7 @@ Dapr is incrementally adoptable. You can use any of the API building blocks as n
19
19
20
20
## Capabilities and features
21
21
22
-
[Using the Dapr extension to provision Dapr on your AKS or Arc-enabled Kubernetes cluster](../azure-arc/kubernetes/conceptual-extensions.md) eliminates the overhead of:
22
+
[Using the Dapr extension to provision Dapr on your AKS or Arc-enabled Kubernetes cluster][dapr-create-extension] eliminates the overhead of:
23
23
- Downloading Dapr tooling
24
24
- Manually installing and managing the Dapr runtime on your AKS cluster
25
25
@@ -63,7 +63,7 @@ Microsoft provides best-effort support for [the latest version of Dapr and two p
63
63
- 1.12.x
64
64
- 1.11.x
65
65
66
-
You can run Azure CLI commands to retreive a list of available versions in [a cluster](/cli/azure/k8s-extension/extension-types#az-k8s-extension-extension-types-list-versions-by-cluster) or [a location](/cli/azure/k8s-extension/extension-types#az-k8s-extension-extension-types-list-versions-by-location).
66
+
You can run Azure CLI commands to retrieve a list of available versions in [a cluster](/cli/azure/k8s-extension/extension-types#az-k8s-extension-extension-types-list-versions-by-cluster) or [a location](/cli/azure/k8s-extension/extension-types#az-k8s-extension-extension-types-list-versions-by-location).
67
67
68
68
To view a list of the stable Dapr versions available to your managed AKS cluster, run the following command:
69
69
@@ -221,7 +221,9 @@ If you install Dapr through the AKS extension, our recommendation is to continue
221
221
222
222
## Next Steps
223
223
224
-
After learning about Dapr and some of the challenges it solves, try [Deploying an application with the Dapr cluster extension][dapr-quickstart].
224
+
> [!div class="nextstepaction"]
225
+
> [Walk through the Dapr extension quickstart to demo how it works][dapr-quickstart]
0 commit comments