You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/model-router.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,9 +21,9 @@ Model router intelligently selects the best underlying model for a given prompt
21
21
22
22
## Versioning
23
23
24
-
Each version of model router is associated with a specific set of underlying models and their versions. This set won't change—only newer versions of model router can expose new underlying models.
24
+
Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed—only newer versions of model router can expose new underlying models.
25
25
26
-
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model will automatically update when new versions become available. When that happens, the set of underlying models will also change, and this could affect the overall performance of the model as well as costs.
26
+
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
27
27
28
28
## Underlying models
29
29
@@ -42,7 +42,7 @@ Global Standard region support.
42
42
43
43
## Billing information
44
44
45
-
When you use Azure OpenAI model router, you are only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
45
+
When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
46
46
47
47
You can monitor the overall costs of your model router deployment in the Azure portal. TBD
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/model-router.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,19 +31,19 @@ Model router is packaged as a single OpenAI model that you deploy. Follow the st
31
31
32
32
You can use model router through the [chat completions API](/azure/ai-services/openai/chatgpt-quickstart) in the same way you'd use other OpenAI chat models. Set the `model` parameter to the name of our model router deployment, and set the `messages` parameter to the messages you want to send to the model.
33
33
34
-
In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message will show which underlying model was selected to respond.
34
+
In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message shows which underlying model was selected to respond.
35
35
36
36
37
37
> [!IMPORTANT]
38
-
> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it will ignore the `Temperature` and `Top_P` input parameters.
38
+
> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters.
39
39
40
40
> [!IMPORTANT]
41
-
> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) is not supported in model router. If the model router selects a reasoning model for your prompt, it will also select a `reasoning_effort` input value based on the complexity of the prompt.
41
+
> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt.
42
42
43
43
44
44
## Evaluate model router performance
45
45
46
-
46
+
TBD
47
47
you can create a custom metric, and submit a job to compare the router to other models. then in foundry portal you can compare the performances.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/quotas-limits.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ The following sections provide you with a quick guide to the default quotas and
90
90
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
91
91
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
92
92
93
-
## modelrouter rate limits
93
+
## model-router rate limits
94
94
95
95
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
96
96
|---|---|:---:|:---:|
@@ -193,7 +193,7 @@ M = million | K = thousand
193
193
194
194
### gpt-4o audio
195
195
196
-
The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
196
+
The rate limits for each `gpt-4o` audio model deployment are 100 K TPM and 1 K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100 K TPM and 1 K RPM.
197
197
198
198
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
199
199
|---|---|:---:|:---:|
@@ -206,7 +206,7 @@ M = million | K = thousand
206
206
207
207
## Usage tiers
208
208
209
-
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
209
+
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to use Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
210
210
211
211
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
212
212
@@ -244,7 +244,7 @@ If your Azure subscription is linked to certain [offer types](https://azure.micr
244
244
245
245
<sup>*</sup>This only applies to a small number of legacy CSP sandbox subscriptions. Use the query below to determine what `quotaId` is associated with your subscription.
246
246
247
-
To determine the offer type that is associated with your subscription you can check your `quotaId`. If your `quotaId` isn't listed in this table your subscription qualifies for default quota.
247
+
To determine the offer type that is associated with your subscription. you can check your `quotaId`. If your `quotaId` isn't listed in this table your subscription qualifies for default quota.
248
248
249
249
# [REST](#tab/REST)
250
250
@@ -322,7 +322,7 @@ You can view quota availability by region for your subscription in the [Azure AI
322
322
Alternatively to view quota capacity by region for a specific model/version you can query the [capacity API](/rest/api/aiservices/accountmanagement/model-capacities/list) for your subscription. Provide a `subscriptionId`, `model_name`, and `model_version` and the API will return the available capacity for that model across all regions, and deployment types for your subscription.
323
323
324
324
> [!NOTE]
325
-
> Currently both the Azure AI Foundry portal and the capacity API will return quota/capacity information for models that are [retired](./concepts/model-retirements.md) and no longer available.
325
+
> Currently both the Azure AI Foundry portal and the capacity API return quota/capacity information for models that are [retired](./concepts/model-retirements.md) and no longer available.
0 commit comments