Skip to content

Commit fff34e3

Browse files
authored
Merge pull request #5038 from PatrickFarley/aoai-build
Aoai build
2 parents d35ec07 + 0ec3d38 commit fff34e3

File tree

5 files changed

+30
-22
lines changed

5 files changed

+30
-22
lines changed

articles/ai-services/openai/concepts/model-router.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Azure OpenAI model router (preview) concepts
2+
title: Model router for Azure AI Foundry (preview) concepts
33
titleSuffix: Azure OpenAI
44
description: Learn about the model router feature in Azure OpenAI Service.
55
author: PatrickFarley
@@ -11,13 +11,13 @@ ms.custom:
1111
manager: nitinme
1212
---
1313

14-
# Azure OpenAI model router (preview)
14+
# Model router for Azure AI Foundry (preview)
1515

16-
Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model.
16+
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
1717

1818
## Why use model router?
1919

20-
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single chat experience that combines the best features from all of the underlying chat models.
20+
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.
2121

2222
## Versioning
2323

@@ -29,24 +29,32 @@ If you select **Auto-update** at the deployment step (see [Manage models](/azure
2929

3030
|Model router version|Underlying models (version)|
3131
|---|---|
32-
|`2025-04-15`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
32+
|`2025-05-19`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
3333

3434

3535
## Limitations
3636

37+
### Resource limitations
38+
39+
See the [Models](../concepts/models.md#model-router) page for the region availability and deployment types for model router.
40+
41+
### Technical limitations
42+
3743
See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
3844

39-
The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
40-
- Summarize the prompt before passing it to the model
41-
- Truncate the prompt into more relevant parts
42-
- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search)
45+
> [!NOTE]
46+
> The context window limit listed on the [Models](../concepts/models.md#model-router) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
47+
> - Summarize the prompt before passing it to the model
48+
> - Truncate the prompt into more relevant parts
49+
> - Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search)
4350
51+
Model router accepts image inputs for [Vision enabled chats](/azure/ai-services/openai/how-to/gpt-with-vision) (all of the underlying models can accept image input), but the routing decision is based on the text input only.
4452

4553
Model router doesn't process audio input.
4654

4755
## Billing information
4856

49-
When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
57+
When you use model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model routing function itself doesn't incur any extra charges.
5058

5159
You can monitor the costs of your model router deployment in the Azure portal.
5260

articles/ai-services/openai/concepts/models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,13 @@ A model that intelligently selects from a set of underlying chat models to respo
5656

5757
| Model | Region |
5858
|---|---|
59-
| `model-router` (2025-04-15) | East US 2 (Global Standard), Sweden Central (Global Standard)|
59+
| `model-router` (2025-05-19) | East US 2 (Global Standard), Sweden Central (Global Standard)|
6060

6161
### Capabilities
6262

6363
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
6464
| --- | :--- |:--- |:---|:---: |
65-
| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
65+
| `model-router` (2025-05-19) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
6666

6767
*Larger context windows are compatible with _some_ of the underlying models, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail.
6868

articles/ai-services/openai/how-to/model-router.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: How to use model router (preview) in Azure OpenAI Service
2+
title: How to use model router for Azure AI Foundry (preview)
33
titleSuffix: Azure OpenAI Service
44
description: Learn how to use the model router in Azure OpenAI Service to select the best model for your task.
55
author: PatrickFarley
@@ -11,20 +11,20 @@ ms.date: 04/17/2025
1111
manager: nitinme
1212
---
1313

14-
# Use Azure OpenAI model router (preview)
14+
# Use model router for Azure AI Foundry (preview)
1515

16-
Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
16+
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible, all packaged as a single model deployment. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
1717

18-
You can access model router through the Completions API just as you would use a single base model like GPT-4.
18+
You can access model router through the Completions API just as you would use a single base model like GPT-4. The steps are the same as in the [Chat completions guide](/azure/ai-services/openai/how-to/chatgpt).
1919

2020
## Deploy a model router model
2121

22-
Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `Azure OpenAI model router` in the **Model** list. Select it, and then complete the rest of the deployment steps.
22+
Model router is packaged as a single Azure AI Foundry model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource). In the **Create new deployment** step, find `model-router` in the **Models** list. Select it, and then complete the rest of the deployment steps.
2323

2424
> [!NOTE]
2525
> Consider that your deployment settings apply to all underlying chat models that model router uses.
2626
> - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models.
27-
> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models.
27+
> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all content passed to and from the model router: you don't set content filters for each of the underlying chat models.
2828
> - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models.
2929
3030
## Use model router in chats
@@ -44,7 +44,7 @@ In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to you
4444
4545
### Output format
4646

47-
The JSON response you receive from a model router model looks like the following.
47+
The JSON response you receive from a model router model is identical to the standard chat completions API response. Note that the `"model"` field reveals which underlying model was selected to respond to the prompt.
4848

4949
```json
5050
{

articles/ai-services/openai/quotas-limits.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ The following sections provide you with a quick guide to the default quotas and
9494

9595
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
9696
|---|---|:---:|:---:|
97-
| `model-router` (2025-04-15) | Enterprise Tier | 10 M | 10 K |
98-
| `model-router` (2025-04-15) | Default | 1 M | 1 K |
97+
| `model-router` (2025-05-19) | Enterprise Tier | 10 M | 10 K |
98+
| `model-router` (2025-05-19) | Default | 1 M | 1 K |
9999

100100

101101
## computer-use-preview global standard rate limits

articles/ai-services/openai/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Spotlighting is a sub-feature of prompt shields that enhances protection against
3131

3232
### Model router (preview)
3333

34-
Azure OpenAI model router is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](./concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](./concepts/model-router.md).
34+
Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](./concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](./concepts/model-router.md).
3535

3636
## April 2025
3737

0 commit comments

Comments
 (0)