You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this article, you learn about the Phi-3 family of small language models (SLMs). You also learn two ways to use Azure AI Studio to deploy models from this family: deploy as serverless APIs with pay-as-you-go token-based billing or deploy with hosted infrastructure in real-time endpoints.
20
+
In this article, you learn about the Phi-3 family of small language models (SLMs). You also learn to use Azure AI Studio to deploy models from this familyas serverless APIs with pay-as-you-go token-based billing.
21
21
22
22
The Phi-3 family of SLMs is a collection of instruction-tuned generative text models. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across various language, reasoning, coding, and math benchmarks.
23
23
@@ -81,8 +81,6 @@ To create a deployment:
81
81
82
82
1. Select the project in which you want to deploy your model. To deploy the Phi-3 model, your project must be in the *EastUS2* or *Sweden Central* region.
83
83
84
-
1. In the deployment wizard, select the link to **Azure Marketplace Terms** to learn more about the terms of use.
85
-
86
84
1. Select the **Pricing and terms** tab to learn about pricing for the selected model.
87
85
88
86
1. Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region.
@@ -112,7 +110,7 @@ Models deployed as a service can be consumed using the chat API, depending on th
112
110
113
111
### Cost and quota considerations for Phi-3 models deployed as a service
114
112
115
-
You can find the Azure Marketplace pricing on the **Pricing and terms** tab of the deployment wizard when deploying the model.
113
+
You can find the pricing information on the **Pricing and terms** tab of the deployment wizard when deploying the model.
116
114
117
115
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
0 commit comments