Skip to content

Commit 8543eca

Browse files
committed
acrolinx
1 parent 128d373 commit 8543eca

File tree

3 files changed

+12
-12
lines changed

3 files changed

+12
-12
lines changed

articles/ai-services/openai/concepts/model-router.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ Model router intelligently selects the best underlying model for a given prompt
2121

2222
## Versioning
2323

24-
Each version of model router is associated with a specific set of underlying models and their versions. This set won't change—only newer versions of model router can expose new underlying models.
24+
Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed—only newer versions of model router can expose new underlying models.
2525

26-
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model will automatically update when new versions become available. When that happens, the set of underlying models will also change, and this could affect the overall performance of the model as well as costs.
26+
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
2727

2828
## Underlying models
2929

@@ -42,7 +42,7 @@ Global Standard region support.
4242

4343
## Billing information
4444

45-
When you use Azure OpenAI model router, you are only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
45+
When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
4646

4747
You can monitor the overall costs of your model router deployment in the Azure portal. TBD
4848

articles/ai-services/openai/how-to/model-router.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,19 +31,19 @@ Model router is packaged as a single OpenAI model that you deploy. Follow the st
3131

3232
You can use model router through the [chat completions API](/azure/ai-services/openai/chatgpt-quickstart) in the same way you'd use other OpenAI chat models. Set the `model` parameter to the name of our model router deployment, and set the `messages` parameter to the messages you want to send to the model.
3333

34-
In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message will show which underlying model was selected to respond.
34+
In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message shows which underlying model was selected to respond.
3535

3636

3737
> [!IMPORTANT]
38-
> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it will ignore the `Temperature` and `Top_P` input parameters.
38+
> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters.
3939
4040
> [!IMPORTANT]
41-
> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) is not supported in model router. If the model router selects a reasoning model for your prompt, it will also select a `reasoning_effort` input value based on the complexity of the prompt.
41+
> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt.
4242
4343

4444
## Evaluate model router performance
4545

46-
46+
TBD
4747
you can create a custom metric, and submit a job to compare the router to other models. then in foundry portal you can compare the performances.
4848

4949
We provide custom metric test via notebooks.

articles/ai-services/openai/quotas-limits.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ The following sections provide you with a quick guide to the default quotas and
9090
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
9191
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
9292

93-
## model router rate limits
93+
## model-router rate limits
9494

9595
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
9696
|---|---|:---:|:---:|
@@ -193,7 +193,7 @@ M = million | K = thousand
193193

194194
### gpt-4o audio
195195

196-
The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
196+
The rate limits for each `gpt-4o` audio model deployment are 100 K TPM and 1 K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100 K TPM and 1 K RPM.
197197

198198
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
199199
|---|---|:---:|:---:|
@@ -206,7 +206,7 @@ M = million | K = thousand
206206

207207
## Usage tiers
208208

209-
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
209+
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to use Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
210210

211211
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
212212

@@ -244,7 +244,7 @@ If your Azure subscription is linked to certain [offer types](https://azure.micr
244244

245245
<sup>*</sup>This only applies to a small number of legacy CSP sandbox subscriptions. Use the query below to determine what `quotaId` is associated with your subscription.
246246

247-
To determine the offer type that is associated with your subscription you can check your `quotaId`. If your `quotaId` isn't listed in this table your subscription qualifies for default quota.
247+
To determine the offer type that is associated with your subscription. you can check your `quotaId`. If your `quotaId` isn't listed in this table your subscription qualifies for default quota.
248248

249249
# [REST](#tab/REST)
250250

@@ -322,7 +322,7 @@ You can view quota availability by region for your subscription in the [Azure AI
322322
Alternatively to view quota capacity by region for a specific model/version you can query the [capacity API](/rest/api/aiservices/accountmanagement/model-capacities/list) for your subscription. Provide a `subscriptionId`, `model_name`, and `model_version` and the API will return the available capacity for that model across all regions, and deployment types for your subscription.
323323

324324
> [!NOTE]
325-
> Currently both the Azure AI Foundry portal and the capacity API will return quota/capacity information for models that are [retired](./concepts/model-retirements.md) and no longer available.
325+
> Currently both the Azure AI Foundry portal and the capacity API return quota/capacity information for models that are [retired](./concepts/model-retirements.md) and no longer available.
326326
327327
[API Reference](/rest/api/aiservices/accountmanagement/model-capacities/list)
328328

0 commit comments

Comments
 (0)