MicrosoftDocs
diff --git a/‎articles/ai-services/openai/concepts/model-router.md‎
Lines changed: 56 additions & 0 deletions b/‎articles/ai-services/openai/concepts/model-router.md‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/concepts/models.md‎
Lines changed: 22 additions & 4 deletions b/‎articles/ai-services/openai/concepts/models.md‎
Lines changed: 22 additions & 4 deletions
diff --git a/‎articles/ai-services/openai/how-to/model-router.md‎
Lines changed: 157 additions & 0 deletions b/‎articles/ai-services/openai/how-to/model-router.md‎
Lines changed: 157 additions & 0 deletions
@@ -0,0 +1,56 @@
+---
+title: Azure OpenAI model router (preview) concepts
+titleSuffix: Azure OpenAI
+description: Learn about the model router feature in Azure OpenAI Service.
+author: PatrickFarley
+ms.author: pafarley
+ms.service: azure-ai-openai
+ms.topic: conceptual 
+ms.date: 05/08/2025
+ms.custom: 
+manager: nitinme
+---
+
+# Azure OpenAI model router (preview)
+
+Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. 
+
+## Why use model router?
+
+Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single chat experience that combines the best features from all of the underlying chat models.
+
+## Versioning 
+
+Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed&mdash;only newer versions of model router can expose new underlying models.
+
+If you select **Auto-update** at the deployment step (see [Manage models](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#model-updates)), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.
+
+## Underlying models
+
+|Model router version|Underlying models (version)|
+|---|---|
+|`2025-04-15`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
+
+
+## Limitations
+
+See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
+
+The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
+- Summarize the prompt before passing it to the model
+- Truncate the prompt into more relevant parts
+- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search) 
+
+
+Model router doesn't process audio input.
+
+## Billing information
+
+When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
+
+You can monitor the costs of your model router deployment in the Azure portal.
+
+## Next step
+
+> [!DIV class="nextstepaction"]
+> [How to use model router](../how-to/model-router.md)
@@ -19,6 +19,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 | Models | Description |
 |--|--|
 | [GPT-4.1 series](#gpt-41-series) | Latest model release from Azure OpenAI |
+| [model-router](#model-router) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. |
 | [computer-use-preview](#computer-use-preview) | An experimental model trained for use with the Responses API computer use tool. |
 | [GPT-4.5 Preview](#gpt-45-preview) |The latest GPT model that excels at diverse text and image tasks.  |
 | [o-series models](#o-series-models) |[Reasoning models](../how-to/reasoning.md) with advanced problem-solving and increased focus and capability.  |
@@ -31,7 +32,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 
 ## GPT 4.1 series
 
-### Region Availability
+### Region availability
 
 | Model | Region |
 |---|---|
@@ -47,6 +48,23 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 | `gpt-4.1-nano` (2025-04-14) <br><br> **Fastest 4.1 model** | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions)   | 1,047,576  | 32,768 | May 31, 2024 |
 | `gpt-4.1-mini` (2025-04-14) | - Text & image input <br> - Text output <br> - Chat completions API <br>- Responses API <br> - Streaming <br> - Function calling <br> Structured outputs (chat completions)   | 1,047,576  | 32,768 | May 31, 2024 |
 
+## model-router
+
+A model that intelligently selects from a set of underlying chat models to respond to a given prompt.
+
+### Region availability
+
+| Model | Region |
+|---|---|
+| `model-router` (2025-04-15) | East US 2 (Global Standard), Sweden Central (Global Standard)|
+
+### Capabilities 
+
+|  Model ID  | Description | Context Window | Max Output Tokens | Training Data (up to)  |
+|  --- |  :--- |:--- |:---|:---: |
+| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
+
+*Larger context windows are compatible with _some_ of the underlying models, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail.
 
 ## computer-use-preview
 
@@ -63,7 +81,7 @@ Request access: [`computer-use-preview` limited access model application](https:
 
 Once access has been granted, you will need to create a deployment for the model.
 
-### Region Availability
+### Region availability
 
 | Model | Region |
 |---|---|
@@ -78,7 +96,7 @@ Once access has been granted, you will need to create a deployment for the model
 
 ## GPT-4.5 Preview
 
-### Region Availability
+### Region availability
 
 | Model | Region |
 |---|---|
@@ -88,7 +106,7 @@ Once access has been granted, you will need to create a deployment for the model
 
 |  Model ID  | Description | Context Window | Max Output Tokens | Training Data (up to)  |
 |  --- |  :--- |:--- |:---|:---: |
-| `gpt-4.5-preview` (2025-02-27) <br> **GPT-4.5 Preview**  | [GPT 4.1](#gpt-41-series) is the recommended replacement for this model. Excels at diverse text and image tasks. <br>-Structured outputs <br>-Prompt caching <br>-Tools <br>-Streaming<br>-Text(input/output)<br>- Image(input)   | 128,000 | 16,384 | Oct 2023 |
+| `gpt-4.5-preview` (2025-02-27) <br> **GPT-4.5 Preview**  | [GPT 4.1](#gpt-41-series) is the recommended replacement for this model. Excels at diverse text and image tasks. <br>- Structured outputs <br>- Prompt caching <br>- Tools <br>- Streaming<br>- Text(input/output)<br>- Image(input)   | 128,000 | 16,384 | Oct 2023 |
 
 > [!NOTE]
 > It is expected behavior that the model cannot answer questions about itself. If you want to know when the knowledge cutoff for the model's training data is, or other details about the model you should refer to the model documentation above.
 
@@ -0,0 +1,157 @@
+---
+title: How to use model router (preview) in Azure OpenAI Service
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the model router in Azure OpenAI Service to select the best model for your task.
+author: PatrickFarley
+ms.author: pafarley 
+#customer intent: 
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 04/17/2025
+manager: nitinme
+---
+
+# Use Azure OpenAI model router (preview)
+
+Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
+
+You can access model router through the Completions API just as you would use a single base model like GPT-4.
+
+## Deploy a model router model
+
+Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `Azure OpenAI model router` in the **Model** list. Select it, and then complete the rest of the deployment steps.
+
+> [!NOTE]
+> Consider that your deployment settings apply to all underlying chat models that model router uses.
+> - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models.
+> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models.
+> - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models.
+
+## Use model router in chats
+
+You can use model router through the [chat completions API](/azure/ai-services/openai/chatgpt-quickstart) in the same way you'd use other OpenAI chat models. Set the `model` parameter to the name of our model router deployment, and set the `messages` parameter to the messages you want to send to the model.
+
+In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message shows which underlying model was selected to respond.
+
+
+> [!IMPORTANT]
+> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters.
+>
+> The parameters `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, and `logprobs` are similarly dropped for o-series models but used otherwise.
+
+> [!IMPORTANT]
+> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt.
+
+### Output format 
+
+The JSON response you receive from a model router model looks like the following.
+
+```json
+{
+  "choices": [
+    {
+      "content_filter_results": {
+        "hate": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "protected_material_code": {
+          "detected": "False",
+          "filtered": "False"
+        },
+        "protected_material_text": {
+          "detected": "False",
+          "filtered": "False"
+        },
+        "self_harm": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "sexual": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "violence": {
+          "filtered": "False",
+          "severity": "safe"
+        }
+      },
+      "finish_reason": "stop",
+      "index": 0,
+      "logprobs": "None",
+      "message": {
+        "content": "I'm doing well, thank you! How can I assist you today?",
+        "refusal": "None",
+        "role": "assistant"
+      }
+    }
+  ],
+  "created": 1745308617,
+  "id": "xxxx-yyyy-zzzz",
+  "model": "gpt-4.1-nano-2025-04-14",
+  "object": "chat.completion",
+  "prompt_filter_results": [
+    {
+      "content_filter_results": {
+        "hate": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "jailbreak": {
+          "detected": "False",
+          "filtered": "False"
+        },
+        "self_harm": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "sexual": {
+          "filtered": "False",
+          "severity": "safe"
+        },
+        "violence": {
+          "filtered": "False",
+          "severity": "safe"
+        }
+      },
+      "prompt_index": 0
+    }
+  ],
+  "system_fingerprint": "xxxx",
+  "usage": {
+    "completion_tokens": 15,
+    "completion_tokens_details": {
+      "accepted_prediction_tokens": 0,
+      "audio_tokens": 0,
+      "reasoning_tokens": 0,
+      "rejected_prediction_tokens": 0
+    },
+    "prompt_tokens": 21,
+    "prompt_tokens_details": {
+      "audio_tokens": 0,
+      "cached_tokens": 0
+    },
+    "total_tokens": 36
+  }
+}
+```
+
+
+## Monitor model router metrics
+
+### Monitor performance
+
+You can monitor the performance of your model router deployment in Azure monitor (AzMon) in the Azure portal. 
+
+1. Go to the **Monitoring** -> **Metrics** page for your Azure OpenAI resource in the Azure portal. 
+1. Filter by the deployment name of your model router model.
+1. Optionally, split up the metrics by underlying models.
+
+
+### Monitor costs
+
+You can monitor the costs of model router, which is the sum of the costs incurred by the underlying models.
+1. Visit the **Resource Management** -> **Cost analysis** page in the Azure portal.
+1. If needed, filter by Azure resource.
+1. Then, filter by deployment name: Filter by "Tag", select **Deployment** as the type of the tag, and then select your model router deployment name as the value.
+