Skip to content

Commit 0ec3d38

Browse files
committed
updates per team review
1 parent 6fdc54c commit 0ec3d38

File tree

2 files changed

+20
-12
lines changed

2 files changed

+20
-12
lines changed

articles/ai-services/openai/concepts/model-router.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ manager: nitinme
1313

1414
# Model router for Azure AI Foundry (preview)
1515

16-
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model.
16+
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
1717

1818
## Why use model router?
1919

20-
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single chat experience that combines the best features from all of the underlying chat models.
20+
Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.
2121

2222
## Versioning
2323

@@ -34,19 +34,27 @@ If you select **Auto-update** at the deployment step (see [Manage models](/azure
3434

3535
## Limitations
3636

37+
### Resource limitations
38+
39+
See the [Models](../concepts/models.md#model-router) page for the region availability and deployment types for model router.
40+
41+
### Technical limitations
42+
3743
See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
3844

39-
The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
40-
- Summarize the prompt before passing it to the model
41-
- Truncate the prompt into more relevant parts
42-
- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search)
45+
> [!NOTE]
46+
> The context window limit listed on the [Models](../concepts/models.md#model-router) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
47+
> - Summarize the prompt before passing it to the model
48+
> - Truncate the prompt into more relevant parts
49+
> - Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search)
4350
51+
Model router accepts image inputs for [Vision enabled chats](/azure/ai-services/openai/how-to/gpt-with-vision) (all of the underlying models can accept image input), but the routing decision is based on the text input only.
4452

4553
Model router doesn't process audio input.
4654

4755
## Billing information
4856

49-
When you use model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
57+
When you use model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model routing function itself doesn't incur any extra charges.
5058

5159
You can monitor the costs of your model router deployment in the Azure portal.
5260

articles/ai-services/openai/how-to/model-router.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,18 @@ manager: nitinme
1313

1414
# Use model router for Azure AI Foundry (preview)
1515

16-
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
16+
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible, all packaged as a single model deployment. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
1717

18-
You can access model router through the Completions API just as you would use a single base model like GPT-4.
18+
You can access model router through the Completions API just as you would use a single base model like GPT-4. The steps are the same as in the [Chat completions guide](/azure/ai-services/openai/how-to/chatgpt).
1919

2020
## Deploy a model router model
2121

22-
Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `model-router` in the **Models** list. Select it, and then complete the rest of the deployment steps.
22+
Model router is packaged as a single Azure AI Foundry model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource). In the **Create new deployment** step, find `model-router` in the **Models** list. Select it, and then complete the rest of the deployment steps.
2323

2424
> [!NOTE]
2525
> Consider that your deployment settings apply to all underlying chat models that model router uses.
2626
> - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models.
27-
> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models.
27+
> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all content passed to and from the model router: you don't set content filters for each of the underlying chat models.
2828
> - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models.
2929
3030
## Use model router in chats
@@ -44,7 +44,7 @@ In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to you
4444
4545
### Output format
4646

47-
The JSON response you receive from a model router model looks like the following.
47+
The JSON response you receive from a model router model is identical to the standard chat completions API response. Note that the `"model"` field reveals which underlying model was selected to respond to the prompt.
4848

4949
```json
5050
{

0 commit comments

Comments
 (0)