Merge pull request #5038 from PatrickFarley/aoai-build

v-shils · web-flow · commit fff34e375963 · 2025-05-19T12:57:44.000-07:00
Aoai build
diff --git a/articles/ai-services/openai/concepts/model-router.md b/articles/ai-services/openai/concepts/model-router.md
@@ -1,5 +1,5 @@
 ---
-title: Azure OpenAI model router (preview) concepts
+title: Model router for Azure AI Foundry (preview) concepts
 titleSuffix: Azure OpenAI
 description: Learn about the model router feature in Azure OpenAI Service.
 author: PatrickFarley
@@ -11,13 +11,13 @@ ms.custom:
 manager: nitinme
 ---
 
-# Azure OpenAI model router (preview)
+# Model router for Azure AI Foundry (preview)
 
-Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. 
+Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
 
 ## Why use model router?
 
-Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single chat experience that combines the best features from all of the underlying chat models.
+Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.
 
 ## Versioning 
 
@@ -29,24 +29,32 @@ If you select **Auto-update** at the deployment step (see [Manage models](/azure
 
 |Model router version|Underlying models (version)|
 |---|---|
-|`2025-04-15`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
+|`2025-05-19`|GPT-4.1 (`2025-04-14`)</br>GPT-4.1-mini (`2025-04-14`)</br>GPT-4.1-nano (`2025-04-14`) </br>o4-mini (`2025-04-16`) |
 
 
 ## Limitations
 
+### Resource limitations
+
+See the [Models](../concepts/models.md#model-router) page for the region availability and deployment types for model router.
+
+### Technical limitations
+
 See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
 
-The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
-- Summarize the prompt before passing it to the model
-- Truncate the prompt into more relevant parts
-- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search) 
+> [!NOTE]
+> The context window limit listed on the [Models](../concepts/models.md#model-router) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
+> - Summarize the prompt before passing it to the model
+> - Truncate the prompt into more relevant parts
+> - Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search) 
 
+Model router accepts image inputs for [Vision enabled chats](/azure/ai-services/openai/how-to/gpt-with-vision) (all of the underlying models can accept image input), but the routing decision is based on the text input only.
 
 Model router doesn't process audio input.
 
 ## Billing information
 
-When you use Azure OpenAI model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model router itself doesn't incur any extra charges.
+When you use model router, you're only billed for the use of the underlying models as they're recruited to respond to prompts. The model routing function itself doesn't incur any extra charges.
 
 You can monitor the costs of your model router deployment in the Azure portal.
 
diff --git a/articles/ai-services/openai/concepts/models.md b/articles/ai-services/openai/concepts/models.md
@@ -56,13 +56,13 @@ A model that intelligently selects from a set of underlying chat models to respo
 
 | Model | Region |
 |---|---|
-| `model-router` (2025-04-15) | East US 2 (Global Standard), Sweden Central (Global Standard)|
+| `model-router` (2025-05-19) | East US 2 (Global Standard), Sweden Central (Global Standard)|
 
 ### Capabilities 
 
 |  Model ID  | Description | Context Window | Max Output Tokens | Training Data (up to)  |
 |  --- |  :--- |:--- |:---|:---: |
-| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
+| `model-router` (2025-05-19) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 200,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
 
 *Larger context windows are compatible with _some_ of the underlying models, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail.
 
diff --git a/articles/ai-services/openai/how-to/model-router.md b/articles/ai-services/openai/how-to/model-router.md
@@ -1,5 +1,5 @@
 ---
-title: How to use model router (preview) in Azure OpenAI Service
+title: How to use model router for Azure AI Foundry (preview)
 titleSuffix: Azure OpenAI Service
 description: Learn how to use the model router in Azure OpenAI Service to select the best model for your task.
 author: PatrickFarley
@@ -11,20 +11,20 @@ ms.date: 04/17/2025
 manager: nitinme
 ---
 
-# Use Azure OpenAI model router (preview)
+# Use model router for Azure AI Foundry (preview)
 
-Azure OpenAI model router is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
+Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible, all packaged as a single model deployment. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](../concepts/model-router.md).
 
-You can access model router through the Completions API just as you would use a single base model like GPT-4.
+You can access model router through the Completions API just as you would use a single base model like GPT-4. The steps are the same as in the [Chat completions guide](/azure/ai-services/openai/how-to/chatgpt).
 
 ## Deploy a model router model
 
-Model router is packaged as a single OpenAI model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource), and in the **Create new deployment** step, find `Azure OpenAI model router` in the **Model** list. Select it, and then complete the rest of the deployment steps.
+Model router is packaged as a single Azure AI Foundry model that you deploy. Follow the steps in the [resource deployment guide](/azure/ai-services/openai/how-to/create-resource). In the **Create new deployment** step, find `model-router` in the **Models** list. Select it, and then complete the rest of the deployment steps.
 
 > [!NOTE]
 > Consider that your deployment settings apply to all underlying chat models that model router uses.
 > - You don't need to deploy the underlying chat models separately. Model router works independently of your other deployed models.
-> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all activity to and from the model router: you don't set content filters for each of the underlying chat models.
+> - You select a content filter when you deploy the model router model (or you can apply a filter later). The content filter is applied to all content passed to and from the model router: you don't set content filters for each of the underlying chat models.
 > - Your tokens-per-minute rate limit setting is applied to all activity to and from the model router: you don't set rate limits for each of the underlying chat models.
 
 ## Use model router in chats
@@ -44,7 +44,7 @@ In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to you
 
 ### Output format 
 
-The JSON response you receive from a model router model looks like the following.
+The JSON response you receive from a model router model is identical to the standard chat completions API response. Note that the `"model"` field reveals which underlying model was selected to respond to the prompt.
 
 ```json
 {
diff --git a/articles/ai-services/openai/quotas-limits.md b/articles/ai-services/openai/quotas-limits.md
@@ -94,8 +94,8 @@ The following sections provide you with a quick guide to the default quotas and
 
 | Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
 |---|---|:---:|:---:|
-| `model-router` (2025-04-15) | Enterprise Tier | 10 M | 10 K |
-| `model-router` (2025-04-15) | Default         | 1 M | 1 K |
+| `model-router` (2025-05-19) | Enterprise Tier | 10 M | 10 K |
+| `model-router` (2025-05-19) | Default         | 1 M | 1 K |
 
 
 ## computer-use-preview global standard rate limits
diff --git a/articles/ai-services/openai/whats-new.md b/articles/ai-services/openai/whats-new.md
@@ -31,7 +31,7 @@ Spotlighting is a sub-feature of prompt shields that enhances protection against
 
 ### Model router (preview)
 
-Azure OpenAI model router is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](./concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](./concepts/model-router.md).
+Model router for Azure AI Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see the [Model router concepts guide](./concepts/model-router.md). To use model router with the Completions API, follow the [How-to guide](./concepts/model-router.md).
 
 ## April 2025