MicrosoftDocs
diff --git a/‎articles/ai-foundry/model-inference/concepts/endpoints.md‎
Lines changed: 22 additions & 28 deletions b/‎articles/ai-foundry/model-inference/concepts/endpoints.md‎
Lines changed: 22 additions & 28 deletions
diff --git a/‎articles/ai-foundry/model-inference/faq.yml‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/model-inference/faq.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/inference.md‎
Lines changed: 21 additions & 27 deletions b/‎articles/ai-foundry/model-inference/how-to/inference.md‎
Lines changed: 21 additions & 27 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md‎
Lines changed: 3 additions & 3 deletions b/‎articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/quickstart-github-models.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/how-to/quickstart-github-models.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/how-to/use-chat-completions.md‎
Lines changed: 7 additions & 1 deletion b/‎articles/ai-foundry/model-inference/how-to/use-chat-completions.md‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-completion.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/includes/code-create-chat-completion.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-reasoning.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/includes/code-create-chat-reasoning.md‎
Lines changed: 1 addition & 1 deletion
@@ -1,5 +1,5 @@
 ---
-title: Endpoint for Azure AI Foundry Models
+title: Endpoints for Azure AI Foundry Models
 titleSuffix: Azure AI Foundry
 description: Learn about the Azure AI Foundry Models endpoint
 author: santiagxf
@@ -11,7 +11,7 @@ ms.author: fasantia
 ms.custom: ignite-2024, github-universe-2024
 ---
 
-# Endpoint for Azure AI Foundry Models
+# Endpoints for Azure AI Foundry Models
 
 Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
 
@@ -36,19 +36,21 @@ An Azure AI Foundry resource can have as many model deployments as needed and th
 
 To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
 
-## Foundry Models inference endpoint
+## Endpoints
 
-The Foundry Models inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
+Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
 
-* Text embeddings
-* Image embeddings
-* Chat completions
+> [!div class="checklist"]
+> * Azure AI inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`)
+> * Azure OpenAI endpoint (usually with the form `https://<resource-name>.openai.azure.com`)
+
+The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md). 
 
-You can see the endpoint URL and credentials in the **Overview** section:
+The **Azure OpenAI API** exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. Non-OpenAI models may also be exposed in this route.
 
-:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
+To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
 
-### Routing
+## Using Azure AI inference endpoint
 
 The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
 
@@ -58,32 +60,24 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
 
 [!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
 
-[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
-
-> [!TIP]
-> Deployment routing isn't case sensitive.
+For a chat model, you can create a request as follows:
 
-### SDKs
-
-The Foundry Models endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
-
-## Azure OpenAI inference endpoint
-
-Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
+[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
+If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](../how-to/create-model-deployments.md).
 
-:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
+## Key-less authentication
 
-Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
+Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
 
-> [!IMPORTANT]
-> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
+To use key-less authentication, [configure your resource and grant access to users](../how-to/configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
 
-### SDKs
+[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
 
-The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details. 
+## Limitations
 
+* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
+* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
 
 ## Next steps
 
 
@@ -17,9 +17,9 @@ sections:
   - name: General
     questions:
       - question: |
-          What's the difference between Azure OpenAI and Foundry Models?
+          What's the difference between Azure OpenAI and Azure AI Foundry Models?
         answer: |
-          Azure OpenAI gives customers access to advanced language models from OpenAI. Foundry Models extends such capability giving customers access to all the flagship models in Azure AI Foundry under the same service, endpoint, and credentials. It includes Azure OpenAI, Cohere, Mistral AI, Meta Llama, AI21 labs, etc. Customers can seamlessly switch between models without changing their code.
+          Azure OpenAI gives customers access to advanced language models from OpenAI. Azure AI Foundry Models extends such capability giving customers access to all the flagship models in Azure AI Foundry under the same service, endpoint, and credentials. It includes Azure OpenAI, Cohere, Mistral AI, Meta Llama, AI21 labs, etc. Customers can seamlessly switch between models without changing their code.
 
           Azure OpenAI is included in the Models Sold Directly by Azure family in Foundry Models.
       - question: |
 
@@ -1,5 +1,5 @@
 ---
-title: How to use the Azure AI Foundry Models inference endpoint to consume models
+title: How to use the Azure AI Foundry Models inference endpoints to consume models
 titleSuffix: Azure AI Foundry
 description: Learn how to use the Azure AI Foundry Models inference endpoint to consume models
 manager: scottpolly
@@ -12,27 +12,19 @@ ms.author: mopeakande
 ms.reviewer: fasantia
 ---
 
-# Use the Azure AI Foundry Models inference endpoints
+# Use Foundry Models
 
-Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
+Once you have [deployed a model in Azure AI Foundry](create-model-deployments.md), you can consume its capabilities via Azure AI Foundry APIs. There are two different endpoints and APIs to use models in Azure AI Foundry Models.
 
-This article explains how to use the inference endpoint to invoke them.
+## Models inference endpoint
 
-## Endpoints
+The models inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
 
-Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
+* Text embeddings
+* Image embeddings
+* Chat completions
 
-> [!div class="checklist"]
-> * Foundry Models endpoint
-> * Azure OpenAI endpoint
-
-The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md). 
-
-**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
-
-To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
-
-## Using the routing capability in the Foundry Models endpoint
+### Routing
 
 The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
 
@@ -42,24 +34,26 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
 
 [!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
 
-For a chat model, you can create a request as follows:
-
 [!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
+> [!TIP]
+> Deployment routing isn't case sensitive.
+
+
+## Azure OpenAI inference endpoint
+
+Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.
 
-## Key-less authentication
+Azure OpenAI endpoints (usually with the form `https://<resource-name>.openai.azure.com`) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
 
-Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
+:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
 
-To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
+Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
 
-[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
+[!INCLUDE [code-create-openai-client](../includes/code-create-openai-client.md)]
 
-## Limitations
+[!INCLUDE [code-create-openai-chat-completion](../includes/code-create-openai-chat-completion.md)]
 
-* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
-* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
 
 ## Next steps
 
 
@@ -21,7 +21,7 @@ You can change this behavior and deploy both types of models to Azure AI Foundry
 Additionally, deploying models to Azure AI Foundry Models brings the extra benefits of:
 
 > [!div class="checklist"]
-> * [Routing capability](../concepts/endpoints.md#routing).
+> * [Routing capability](inference.md#routing).
 > * [Custom content filters](../concepts/content-filter.md).
 > * Global capacity deployment type.
 > * [Key-less authentication](configure-entra-id.md) with role-based access control.
@@ -79,7 +79,7 @@ To configure the project to use the Foundry Models capability in Azure AI Foundr
     :::image type="content" source="../media/quickstart-ai-project/overview-endpoint-and-key.png" alt-text="Screenshot of the landing page for the project, highlighting the location of the connected resource and the associated inference endpoint." lightbox="../media/quickstart-ai-project/overview-endpoint-and-key.png":::
 
     > [!TIP]
-    > Each Azure AI Foundry Services resource has a single **Foundry Models endpoint** which can be used to access any model deployment on it. The same endpoint serves multiple models depending on which ones are configured. Learn about [how the endpoint works](../concepts/endpoints.md#azure-openai-inference-endpoint).
+    > Each Azure AI Foundry Services resource has a single **Foundry Models endpoint** which can be used to access any model deployment on it. The same endpoint serves multiple models depending on which ones are configured. Learn about [how the endpoint works](inference.md#azure-openai-inference-endpoint).
 
 5. Take note of the endpoint URL and credentials.
 
@@ -136,7 +136,7 @@ Generate your first chat completion:
 
 [!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](../concepts/endpoints.md#routing) concept page to learn how Azure AI Services route deployments.
+Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](inference.md#routing) page to learn how Azure AI Foundry Models routes deployments.
 
 
 ## Move from standard deployments to Foundry Models
 
@@ -75,7 +75,7 @@ Generate your first chat completion:
 
 [!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](../concepts/endpoints.md#routing) concept page to learn how Azure AI Services route deployments.
+Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](inference.md#routing) concept page to learn how Azure AI Services route deployments.
 
 > [!IMPORTANT]
 > As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration. Add as many models as you plan to use before indicating them in the `model` parameter. Learn how to [add more models](create-model-deployments.md) to your resource.
 
@@ -20,35 +20,41 @@ zone_pivot_groups: azure-ai-inference-samples
 ::: zone pivot="programming-language-python"
 
 [!INCLUDE [python](../includes/use-chat-completions/python.md)]
+
 ::: zone-end
 
 
 ::: zone pivot="programming-language-javascript"
 
 [!INCLUDE [javascript](../includes/use-chat-completions/javascript.md)]
+
 ::: zone-end
 
 
 ::: zone pivot="programming-language-java"
 
 [!INCLUDE [java](../includes/use-chat-completions/java.md)]
+
 ::: zone-end
 
 
 ::: zone pivot="programming-language-csharp"
 
 [!INCLUDE [csharp](../includes/use-chat-completions/csharp.md)]
+
 ::: zone-end
 
 
 ::: zone pivot="programming-language-rest"
 
 [!INCLUDE [rest](../includes/use-chat-completions/rest.md)]
+
 ::: zone-end
 
+
 ## Related content
 
 * [Use embeddings models](use-embeddings.md)
 * [Use image embeddings models](use-image-embeddings.md)
 * [Use reasoning models](use-chat-reasoning.md)
-* [Azure AI Foundry Models API](.././reference/reference-model-inference-api.md)
+* [Azure AI Model Inference API](.././reference/reference-model-inference-api.md)
@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({
     }
 });
 
-console.log(response.choices[0].message.content)
+console.log(response.body.choices[0].message.content)
 ```
 
 # [C#](#tab/csharp)
 
@@ -36,7 +36,7 @@ var response = await client.path("/chat/completions").post({
     }
 });
 
-console.log(response.choices[0].message.content)
+console.log(response.body.choices[0].message.content)
 ```
 
 # [C#](#tab/csharp)
Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({`
`38`	`38`	`}`
`39`	`39`	`});`
`40`	`40`
`41`		`-console.log(response.choices[0].message.content)`
	`41`	`+console.log(response.body.choices[0].message.content)`
`42`	`42`	```
`43`	`43`
`44`	`44`	`# [C#](#tab/csharp)`
Original file line number	Diff line number	Diff line change
`@@ -36,7 +36,7 @@ var response = await client.path("/chat/completions").post({`
`36`	`36`	`}`
`37`	`37`	`});`
`38`	`38`
`39`		`-console.log(response.choices[0].message.content)`
	`39`	`+console.log(response.body.choices[0].message.content)`
`40`	`40`	```
`41`	`41`
`42`	`42`	`# [C#](#tab/csharp)`