|
| 1 | +--- |
| 2 | +title: How to use model router (preview) in Azure OpenAI Service |
| 3 | +titleSuffix: Azure OpenAI Service |
| 4 | +description: Learn how to use the model router in Azure OpenAI Service to select the best model for your task. |
| 5 | +author: PatrickFarley |
| 6 | +ms.author: pafarley |
| 7 | +#customer intent: |
| 8 | +ms.service: azure-ai-openai |
| 9 | +ms.topic: how-to |
| 10 | +ms.date: 04/17/2025 |
| 11 | +manager: nitinme |
| 12 | +--- |
| 13 | + |
| 14 | +# Use Azure OpenAI model router (preview) |
| 15 | + |
| 16 | +Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md). |
| 17 | + |
| 18 | +You can access model router through the Completions API just as you would use a single base model like GPT-4. |
| 19 | + |
| 20 | +## Deploy a model router model |
| 21 | + |
| 22 | +Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `Azure OpenAI model router` in the **Model** list. Select it, and then complete the rest of the deployment steps. |
| 23 | + |
| 24 | +> [!NOTE] |
| 25 | +> Consider that your deployment settings apply to all underlying chat models that model router uses. |
| 26 | +> - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models. |
| 27 | +> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models. |
| 28 | +> - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models. |
| 29 | +
|
| 30 | +## Use model router in chats |
| 31 | + |
| 32 | +You can use model router through the [chat completions API](/azure/ai-services/openai/chatgpt-quickstart) in the same way you'd use other OpenAI chat models. Set the `model` parameter to the name of our model router deployment, and set the `messages` parameter to the messages you want to send to the model. |
| 33 | + |
| 34 | +In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to your model router deployment on the **Models + endpoints** page and select it to enter the model playground. In the playground experience, you can enter messages and see the model's responses. Each response message shows which underlying model was selected to respond. |
| 35 | + |
| 36 | + |
| 37 | +> [!IMPORTANT] |
| 38 | +> You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters. |
| 39 | +> |
| 40 | +> The parameters `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, and `logprobs` are similarly dropped for o-series models but used otherwise. |
| 41 | +
|
| 42 | +> [!IMPORTANT] |
| 43 | +> The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt. |
| 44 | +
|
| 45 | +### Output format |
| 46 | + |
| 47 | +The JSON response you receive from a model router model looks like the following. |
| 48 | + |
| 49 | +```json |
| 50 | +{ |
| 51 | + "choices": [ |
| 52 | + { |
| 53 | + "content_filter_results": { |
| 54 | + "hate": { |
| 55 | + "filtered": "False", |
| 56 | + "severity": "safe" |
| 57 | + }, |
| 58 | + "protected_material_code": { |
| 59 | + "detected": "False", |
| 60 | + "filtered": "False" |
| 61 | + }, |
| 62 | + "protected_material_text": { |
| 63 | + "detected": "False", |
| 64 | + "filtered": "False" |
| 65 | + }, |
| 66 | + "self_harm": { |
| 67 | + "filtered": "False", |
| 68 | + "severity": "safe" |
| 69 | + }, |
| 70 | + "sexual": { |
| 71 | + "filtered": "False", |
| 72 | + "severity": "safe" |
| 73 | + }, |
| 74 | + "violence": { |
| 75 | + "filtered": "False", |
| 76 | + "severity": "safe" |
| 77 | + } |
| 78 | + }, |
| 79 | + "finish_reason": "stop", |
| 80 | + "index": 0, |
| 81 | + "logprobs": "None", |
| 82 | + "message": { |
| 83 | + "content": "I'm doing well, thank you! How can I assist you today?", |
| 84 | + "refusal": "None", |
| 85 | + "role": "assistant" |
| 86 | + } |
| 87 | + } |
| 88 | + ], |
| 89 | + "created": 1745308617, |
| 90 | + "id": "xxxx-yyyy-zzzz", |
| 91 | + "model": "gpt-4.1-nano-2025-04-14", |
| 92 | + "object": "chat.completion", |
| 93 | + "prompt_filter_results": [ |
| 94 | + { |
| 95 | + "content_filter_results": { |
| 96 | + "hate": { |
| 97 | + "filtered": "False", |
| 98 | + "severity": "safe" |
| 99 | + }, |
| 100 | + "jailbreak": { |
| 101 | + "detected": "False", |
| 102 | + "filtered": "False" |
| 103 | + }, |
| 104 | + "self_harm": { |
| 105 | + "filtered": "False", |
| 106 | + "severity": "safe" |
| 107 | + }, |
| 108 | + "sexual": { |
| 109 | + "filtered": "False", |
| 110 | + "severity": "safe" |
| 111 | + }, |
| 112 | + "violence": { |
| 113 | + "filtered": "False", |
| 114 | + "severity": "safe" |
| 115 | + } |
| 116 | + }, |
| 117 | + "prompt_index": 0 |
| 118 | + } |
| 119 | + ], |
| 120 | + "system_fingerprint": "xxxx", |
| 121 | + "usage": { |
| 122 | + "completion_tokens": 15, |
| 123 | + "completion_tokens_details": { |
| 124 | + "accepted_prediction_tokens": 0, |
| 125 | + "audio_tokens": 0, |
| 126 | + "reasoning_tokens": 0, |
| 127 | + "rejected_prediction_tokens": 0 |
| 128 | + }, |
| 129 | + "prompt_tokens": 21, |
| 130 | + "prompt_tokens_details": { |
| 131 | + "audio_tokens": 0, |
| 132 | + "cached_tokens": 0 |
| 133 | + }, |
| 134 | + "total_tokens": 36 |
| 135 | + } |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | + |
| 140 | +## Monitor model router metrics |
| 141 | + |
| 142 | +### Monitor performance |
| 143 | + |
| 144 | +You can monitor the performance of your model router deployment in Azure monitor (AzMon) in the Azure portal. |
| 145 | + |
| 146 | +1. Go to the **Monitoring** -> **Metrics** page for your Azure OpenAI resource in the Azure portal. |
| 147 | +1. Filter by the deployment name of your model router model. |
| 148 | +1. Optionally, split up the metrics by underlying models. |
| 149 | + |
| 150 | + |
| 151 | +### Monitor costs |
| 152 | + |
| 153 | +You can monitor the costs of model router, which is the sum of the costs incurred by the underlying models. |
| 154 | +1. Visit the **Resource Management** -> **Cost analysis** page in the Azure portal. |
| 155 | +1. If needed, filter by Azure resource. |
| 156 | +1. Then, filter by deployment name: Filter by "Tag", select **Deployment** as the type of the tag, and then select your model router deployment name as the value. |
| 157 | + |
0 commit comments