MicrosoftDocs
diff --git a/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 22 additions & 28 deletions b/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 22 additions & 28 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/inference.md
Lines changed: 21 additions & 23 deletions b/‎articles/ai-foundry/model-inference/how-to/inference.md
Lines changed: 21 additions & 23 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/use-chat-completions.md
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/model-inference/how-to/use-chat-completions.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/use-openai.md b/‎articles/ai-foundry/model-inference/how-to/use-openai.md
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-chat-completion.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/includes/code-create-chat-completion.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/includes/code-create-openai-chat-completion.md
Lines changed: 97 additions & 0 deletions b/‎articles/ai-foundry/model-inference/includes/code-create-openai-chat-completion.md
Lines changed: 97 additions & 0 deletions
@@ -1,5 +1,5 @@
 ---
-title: Endpoint for Azure AI Foundry Models
+title: Endpoints for Azure AI Foundry Models
 titleSuffix: Azure AI Foundry
 description: Learn about the Azure AI Foundry Models endpoint
 author: santiagxf
@@ -11,7 +11,7 @@ ms.author: fasantia
 ms.custom: ignite-2024, github-universe-2024
 ---
 
-# Endpoint for Azure AI Foundry Models
+# Endpoints for Azure AI Foundry Models
 
 Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
 
@@ -36,19 +36,21 @@ An Azure AI Foundry resource can have as many model deployments as needed and th
 
 To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
 
-## Foundry Models inference endpoint
+## Endpoints
 
-The Foundry Models inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
+Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
 
-* Text embeddings
-* Image embeddings
-* Chat completions
+> [!div class="checklist"]
+> * Azure OpenAI endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`)
+> * Azure AI inference endpoint (usually with the form `https://<resource-name>.openai.azure.com`)
+
+The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md). 
 
-You can see the endpoint URL and credentials in the **Overview** section:
+The **Azure OpenAI API** exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. Non-OpenAI models may also be exposed in this route.
 
-:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
+To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
 
-### Routing
+## Using Azure AI inference endpoint
 
 The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
 
@@ -58,32 +60,24 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
 
 [!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
 
-[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
-
-> [!TIP]
-> Deployment routing isn't case sensitive.
+For a chat model, you can create a request as follows:
 
-### SDKs
-
-The Foundry Models endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
-
-## Azure OpenAI inference endpoint
-
-Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
+[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
+If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
 
-:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
+## Key-less authentication
 
-Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
+Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
 
-> [!IMPORTANT]
-> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
+To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
 
-### SDKs
+[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
 
-The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details. 
+## Limitations
 
+* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
+* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
 
 ## Next steps
 
 
@@ -1,5 +1,5 @@
 ---
-title: How to use the Azure AI Foundry Models inference endpoint to consume models
+title: How to use the Azure AI Foundry Models inference endpoints to consume models
 titleSuffix: Azure AI Foundry
 description: Learn how to use the Azure AI Foundry Models inference endpoint to consume models
 manager: scottpolly
@@ -12,27 +12,23 @@ ms.author: mopeakande
 ms.reviewer: fasantia
 ---
 
-# Use the Azure AI Foundry Models inference endpoints
+# Use Foundry Models
 
 Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
 
 This article explains how to use the inference endpoint to invoke them.
 
-## Endpoints
+There are two different APIs to use models in Azure AI Foundry Models:
 
-Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
+## Models inference endpoint
 
-> [!div class="checklist"]
-> * Foundry Models endpoint
-> * Azure OpenAI endpoint
+The models inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
 
-The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md). 
+* Text embeddings
+* Image embeddings
+* Chat completions
 
-**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
-
-To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
-
-## Using the routing capability in the Foundry Models endpoint
+### Routing
 
 The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
 
@@ -42,24 +38,26 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
 
 [!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
 
-For a chat model, you can create a request as follows:
-
 [!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
 
-If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
+> [!TIP]
+> Deployment routing isn't case sensitive.
+
+
+## Azure OpenAI inference endpoint
+
+Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.
 
-## Key-less authentication
+Azure OpenAI endpoints (usually with the form `https://<resource-name>.openai.azure.com`) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
 
-Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
+:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
 
-To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
+Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
 
-[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
+[!INCLUDE [code-create-openai-client](../includes/code-create-openai-client.md)]
 
-## Limitations
+[!INCLUDE [code-create-openai-chat-completion](../includes/code-create-openai-chat-completion.md)]
 
-* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
-* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
 
 ## Next steps
 
 
@@ -5,7 +5,7 @@ description: Learn how to generate chat completions with Azure AI Foundry Models
 manager: scottpolly
 author: msakande
 reviewer: santiagxf
-ms.service: azure-ai-model-../includes/use-chat-completions
+ms.service: azure-ai-model-inference
 ms.topic: how-to
 ms.date: 1/21/2025
 ms.author: mopeakande
@@ -57,4 +57,4 @@ zone_pivot_groups: azure-ai-inference-samples
 * [Use embeddings models](use-embeddings.md)
 * [Use image embeddings models](use-image-embeddings.md)
 * [Use reasoning models](use-chat-reasoning.md)
-* [Azure AI Foundry Models API](.././reference/reference-model-../includes/use-chat-completions-api.md)
+* [Azure AI Model Inference API](.././reference/reference-model-inference-api.md)
@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({
     }
 });
 
-console.log(response.choices[0].message.content)
+console.log(response.body.choices[0].message.content)
 ```
 
 # [C#](#tab/csharp)
 
@@ -0,0 +1,97 @@
+---
+manager: nitinme
+ms.service: azure-ai-model-inference
+ms.topic: include
+ms.date: 1/21/2025
+ms.author: fasantia
+author: santiagxf
+---
+
+# [Python](#tab/python)
+
+```python
+response = client.chat.completions.create(
+    model="deepseek-v3-0324", # Replace with your model dpeloyment name.
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
+    ]
+)
+
+print(response.model_dump_json(indent=2)
+```
+
+# [JavaScript](#tab/javascript)
+
+```javascript
+var messages = [
+    { role: "system", content: "You are a helpful assistant" },
+    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
+];
+
+const response = await client.chat.completions.create({ messages, model: "deepseek-v3-0324" });
+
+console.log(response.choices[0].message.content)
+```
+
+# [C#](#tab/csharp)
+
+```csharp
+ChatCompletion response = chatClient.CompleteChat(
+    [
+        new SystemChatMessage("You are a helpful assistant."),
+        new UserChatMessage("Explain Riemann's conjecture in 1 paragraph"),
+    ]);
+
+Console.WriteLine($"{response.Role}: {response.Content[0].Text}");
+```
+
+# [Java](#tab/java)
+
+```java
+List<ChatRequestMessage> chatMessages = new ArrayList<>();
+chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
+chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));
+
+ChatCompletions chatCompletions = client.getChatCompletions("deepseek-v3-0324",
+    new ChatCompletionsOptions(chatMessages));
+
+System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreatedAt());
+for (ChatChoice choice : chatCompletions.getChoices()) {
+    ChatResponseMessage message = choice.getMessage();
+    System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
+    System.out.println("Message:");
+    System.out.println(message.getContent());
+}
+```
+
+Here, `deepseek-v3-0324` is the name of a model deployment in the Azure AI Foundry resource.
+
+# [REST](#tab/rest)
+
+__Request__
+
+```HTTP/1.1
+POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
+api-key: <api-key>
+Content-Type: application/json
+```
+
+```JSON
+{
+    "messages": [
+        {
+            "role": "system",
+            "content": "You are a helpful assistant"
+        },
+        {
+            "role": "user",
+            "content": "Explain Riemann's conjecture in 1 paragraph"
+        }
+    ]
+}
+```
+
+Here, `deepseek-v3-0324` is the name of a model deployment in the Azure AI Foundry resource.
+
+---
Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({`
`38`	`38`	`}`
`39`	`39`	`});`
`40`	`40`
`41`		`-console.log(response.choices[0].message.content)`
	`41`	`+console.log(response.body.choices[0].message.content)`
`42`	`42`	```
`43`	`43`
`44`	`44`	`# [C#](#tab/csharp)`