Skip to content

Commit 419064a

Browse files
committed
Merge branch 'release-build-2025-release' of https://github.com/MicrosoftDocs/azure-ai-docs-pr into aoai-build
2 parents ba49a8a + 38cbe57 commit 419064a

File tree

6 files changed

+283
-27
lines changed

6 files changed

+283
-27
lines changed
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Azure OpenAI model router (preview) concepts
3+
titleSuffix: Azure OpenAI
4+
description: Learn about the model router feature in Azure OpenAI Service.
5+
author: PatrickFarley
6+
ms.author: pafarley
7+
ms.service: azure-ai-openai
8+
ms.topic: conceptual
9+
ms.date: 05/08/2025
10+
ms.custom:
11+
manager: nitinme
12+
---
13+
14+
# Azure OpenAI model router (preview)
15+
16+
Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model.
17+
18+
## Why use model router?
19+
20+
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single chat experience that combines the best features from all of the underlying chat models.
21+
22+
## Versioning
23+
24+
Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed—only newer versions of model router can expose new underlying models.
25+
26+
If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
27+
28+
## Underlying models
29+
30+
|Model router version|Underlying models (version)|
31+
|---|---|
32+
|`2025-04-15`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
33+
34+
35+
## Limitations
36+
37+
See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
38+
39+
The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
40+
- Summarize the prompt before passing it to the model
41+
- Truncate the prompt into more relevant parts
42+
- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search)
43+
44+
45+
Model router doesn't process audio input.
46+
47+
## Billing information
48+
49+
When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
50+
51+
You can monitor the costs of your model router deployment in the Azure portal.
52+
53+
## Next step
54+
55+
> [!DIV class="nextstepaction"]
56+
> [How to use model router](../how-to/model-router.md)

articles/ai-services/openai/concepts/models.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
1919
| Models | Description |
2020
|--|--|
2121
| [GPT-4.1 series](#gpt-41-series) | Latest model release from Azure OpenAI |
22+
| [model-router](#model-router) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. |
2223
| [computer-use-preview](#computer-use-preview) | An experimental model trained for use with the Responses API computer use tool. |
2324
| [GPT-4.5 Preview](#gpt-45-preview) |The latest GPT model that excels at diverse text and image tasks. |
2425
| [o-series models](#o-series-models) |[Reasoning models](../how-to/reasoning.md) with advanced problem-solving and increased focus and capability. |
@@ -31,7 +32,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
3132

3233
## GPT 4.1 series
3334

34-
### Region Availability
35+
### Region availability
3536

3637
| Model | Region |
3738
|---|---|
@@ -47,6 +48,23 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
4748
| `gpt-4.1-nano` (2025-04-14) <br><br> **Fastest 4.1 model** | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | 1,047,576 | 32,768 | May 31, 2024 |
4849
| `gpt-4.1-mini` (2025-04-14) | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions) | 1,047,576 | 32,768 | May 31, 2024 |
4950

51+
## model-router
52+
53+
A model that intelligently selects from a set of underlying chat models to respond to a given prompt.
54+
55+
### Region availability
56+
57+
| Model | Region |
58+
|---|---|
59+
| `model-router` (2025-04-15) | East US 2 (Global Standard), Sweden Central (Global Standard)|
60+
61+
### Capabilities
62+
63+
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
64+
| --- | :--- |:--- |:---|:---: |
65+
| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
66+
67+
*Larger context windows are compatible with _some_ of the underlying models, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail.
5068

5169
## computer-use-preview
5270

@@ -63,7 +81,7 @@ Request access: [`computer-use-preview` limited access model application](https:
6381

6482
Once access has been granted, you will need to create a deployment for the model.
6583

66-
### Region Availability
84+
### Region availability
6785

6886
| Model | Region |
6987
|---|---|
@@ -78,7 +96,7 @@ Once access has been granted, you will need to create a deployment for the model
7896

7997
## GPT-4.5 Preview
8098

81-
### Region Availability
99+
### Region availability
82100

83101
| Model | Region |
84102
|---|---|
@@ -88,7 +106,7 @@ Once access has been granted, you will need to create a deployment for the model
88106

89107
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
90108
| --- | :--- |:--- |:---|:---: |
91-
| `gpt-4.5-preview` (2025-02-27) <br> **GPT-4.5 Preview** | [GPT 4.1](#gpt-41-series) is the recommended replacement for this model. Excels at diverse text and image tasks. <br>-Structured outputs <br>-Prompt caching <br>-Tools <br>-Streaming<br>-Text(input/output)<br>- Image(input) | 128,000 | 16,384 | Oct 2023 |
109+
| `gpt-4.5-preview` (2025-02-27) <br> **GPT-4.5 Preview** | [GPT 4.1](#gpt-41-series) is the recommended replacement for this model. Excels at diverse text and image tasks. <br>- Structured outputs <br>- Prompt caching <br>- Tools <br>- Streaming<br>- Text(input/output)<br>- Image(input) | 128,000 | 16,384 | Oct 2023 |
92110

93111
> [!NOTE]
94112
> It is expected behavior that the model cannot answer questions about itself. If you want to know when the knowledge cutoff for the model's training data is, or other details about the model you should refer to the model documentation above.
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
title: How to use model router (preview) in Azure OpenAI Service
3+
titleSuffix: Azure OpenAI Service
4+
description: Learn how to use the model router in Azure OpenAI Service to select the best model for your task.
5+
author: PatrickFarley
6+
ms.author: pafarley
7+
#customer intent:
8+
ms.service: azure-ai-openai
9+
ms.topic: how-to
10+
ms.date: 04/17/2025
11+
manager: nitinme
12+
---
13+
14+
# Use Azure OpenAI model router (preview)
15+
16+
Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
17+
18+
You can access model router through the Completions API just as you would use a single base model like GPT-4.
19+
20+
## Deploy a model router model
21+
22+
Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `Azure OpenAI model router` in the **Model** list. Select it, and then complete the rest of the deployment steps.
23+
24+
> [!NOTE]
25+
> Consider that your deployment settings apply to all underlying chat models that model router uses.
26+
> - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models.
27+
> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models.
28+
> - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models.
29+
30+
## Use model router in chats
31+
32+
You can use model router through the [chat completions API](/azure/ai-services/openai/chatgpt-quickstart) in the same way you'd use other OpenAI chat models. Set the `model` parameter to the name of our model router deployment, and set the `messages` parameter to the messages you want to send to the model.
33+
34+
In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message shows which underlying model was selected to respond.
35+
36+
37+
> [!IMPORTANT]
38+
> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters.
39+
>
40+
> The parameters `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, and `logprobs` are similarly dropped for o-series models but used otherwise.
41+
42+
> [!IMPORTANT]
43+
> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt.
44+
45+
### Output format
46+
47+
The JSON response you receive from a model router model looks like the following.
48+
49+
```json
50+
{
51+
"choices": [
52+
{
53+
"content_filter_results": {
54+
"hate": {
55+
"filtered": "False",
56+
"severity": "safe"
57+
},
58+
"protected_material_code": {
59+
"detected": "False",
60+
"filtered": "False"
61+
},
62+
"protected_material_text": {
63+
"detected": "False",
64+
"filtered": "False"
65+
},
66+
"self_harm": {
67+
"filtered": "False",
68+
"severity": "safe"
69+
},
70+
"sexual": {
71+
"filtered": "False",
72+
"severity": "safe"
73+
},
74+
"violence": {
75+
"filtered": "False",
76+
"severity": "safe"
77+
}
78+
},
79+
"finish_reason": "stop",
80+
"index": 0,
81+
"logprobs": "None",
82+
"message": {
83+
"content": "I'm doing well, thank you! How can I assist you today?",
84+
"refusal": "None",
85+
"role": "assistant"
86+
}
87+
}
88+
],
89+
"created": 1745308617,
90+
"id": "xxxx-yyyy-zzzz",
91+
"model": "gpt-4.1-nano-2025-04-14",
92+
"object": "chat.completion",
93+
"prompt_filter_results": [
94+
{
95+
"content_filter_results": {
96+
"hate": {
97+
"filtered": "False",
98+
"severity": "safe"
99+
},
100+
"jailbreak": {
101+
"detected": "False",
102+
"filtered": "False"
103+
},
104+
"self_harm": {
105+
"filtered": "False",
106+
"severity": "safe"
107+
},
108+
"sexual": {
109+
"filtered": "False",
110+
"severity": "safe"
111+
},
112+
"violence": {
113+
"filtered": "False",
114+
"severity": "safe"
115+
}
116+
},
117+
"prompt_index": 0
118+
}
119+
],
120+
"system_fingerprint": "xxxx",
121+
"usage": {
122+
"completion_tokens": 15,
123+
"completion_tokens_details": {
124+
"accepted_prediction_tokens": 0,
125+
"audio_tokens": 0,
126+
"reasoning_tokens": 0,
127+
"rejected_prediction_tokens": 0
128+
},
129+
"prompt_tokens": 21,
130+
"prompt_tokens_details": {
131+
"audio_tokens": 0,
132+
"cached_tokens": 0
133+
},
134+
"total_tokens": 36
135+
}
136+
}
137+
```
138+
139+
140+
## Monitor model router metrics
141+
142+
### Monitor performance
143+
144+
You can monitor the performance of your model router deployment in Azure monitor (AzMon) in the Azure portal.
145+
146+
1. Go to the **Monitoring** -> **Metrics** page for your Azure OpenAI resource in the Azure portal.
147+
1. Filter by the deployment name of your model router model.
148+
1. Optionally, split up the metrics by underlying models.
149+
150+
151+
### Monitor costs
152+
153+
You can monitor the costs of model router, which is the sum of the costs incurred by the underlying models.
154+
1. Visit the **Resource Management** -> **Cost analysis** page in the Azure portal.
155+
1. If needed, filter by Azure resource.
156+
1. Then, filter by deployment name: Filter by "Tag", select **Deployment** as the type of the tag, and then select your model router deployment name as the value.
157+

0 commit comments

Comments
 (0)