add more param info

PatrickFarley · PatrickFarley · commit ff85c89bd526 · 2025-05-09T15:24:47.000-04:00
diff --git a/articles/ai-services/openai/concepts/model-router.md b/articles/ai-services/openai/concepts/model-router.md
@@ -34,7 +34,13 @@ If you select **Auto-update** at the deployment step (see [Manage models](/azure
 
 ## Limitations
 
-See [Quotas and limits](/azure/ai-services/openai/quotas-limits).
+See [Quotas and limits](/azure/ai-services/openai/quotas-limits) for rate limit information.
+
+The context window limit listed on the [Models](../concepts/models.md) page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
+- Summarize the prompt before passing it to the model
+- Truncate the prompt into more relevant parts
+- Use document embeddings and have the chat model retrieve relevant sections: see [Azure AI Search](/azure/search/search-what-is-azure-search) 
+
 
 Model router doesn't process input images or audio.
 
diff --git a/articles/ai-services/openai/concepts/models.md b/articles/ai-services/openai/concepts/models.md
@@ -62,7 +62,9 @@ A model that intelligently selects from a set of underlying chat models to respo
 
 |  Model ID  | Description | Context Window | Max Output Tokens | Training Data (up to)  |
 |  --- |  :--- |:--- |:---|:---: |
-| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 128,000 | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
+| `model-router` (2025-04-15) | A model that intelligently selects from a set of underlying chat models to respond to a given prompt. | 128,000* | 32768 (GPT 4.1 series)</br> 100 K (o4-mini) | May 31, 2024 |
+
+*Larger context windows are compatible with _some_ of the underlying models, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail.
 
 ## computer-use-preview
 
diff --git a/articles/ai-services/openai/how-to/model-router.md b/articles/ai-services/openai/how-to/model-router.md
@@ -36,6 +36,8 @@ In the [Azure AI Foundry portal](https://ai.azure.com/), you can navigate to you
 
 > [!IMPORTANT]
 > You can set the `Temperature` and `Top_P` parameters to the values you prefer (see the [concepts guide](/azure/ai-services/openai/concepts/prompt-engineering?tabs=chat#temperature-and-top_p-parameters)), but note that reasoning models (o-series) don't support these parameters. If model router selects a reasoning model for your prompt, it ignores the `Temperature` and `Top_P` input parameters.
+>
+> The parameters `stop`, `presence_penalty`, `frequency_penalty`, `logit_bias`, and `logprobs` are similarly dropped for o-series models but used otherwise.
 
 > [!IMPORTANT]
 > The `reasoning_effort` parameter (see the [Reasoning models guide](/azure/ai-services/openai/how-to/reasoning?tabs=python-secure#reasoning-effort)) isn't supported in model router. If the model router selects a reasoning model for your prompt, it also selects a `reasoning_effort` input value based on the complexity of the prompt.