Skip to content

Commit 9421ccb

Browse files
committed
fixes
1 parent c44cb44 commit 9421ccb

22 files changed

+337
-110
lines changed

articles/ai-foundry/model-inference/concepts/endpoints.md

Lines changed: 22 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Endpoint for Azure AI Foundry Models
2+
title: Endpoints for Azure AI Foundry Models
33
titleSuffix: Azure AI Foundry
44
description: Learn about the Azure AI Foundry Models endpoint
55
author: santiagxf
@@ -11,7 +11,7 @@ ms.author: fasantia
1111
ms.custom: ignite-2024, github-universe-2024
1212
---
1313

14-
# Endpoint for Azure AI Foundry Models
14+
# Endpoints for Azure AI Foundry Models
1515

1616
Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
1717

@@ -36,19 +36,21 @@ An Azure AI Foundry resource can have as many model deployments as needed and th
3636

3737
To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
3838

39-
## Foundry Models inference endpoint
39+
## Endpoints
4040

41-
The Foundry Models inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
41+
Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
4242

43-
* Text embeddings
44-
* Image embeddings
45-
* Chat completions
43+
> [!div class="checklist"]
44+
> * Azure OpenAI endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`)
45+
> * Azure AI inference endpoint (usually with the form `https://<resource-name>.openai.azure.com`)
46+
47+
The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md).
4648

47-
You can see the endpoint URL and credentials in the **Overview** section:
49+
The **Azure OpenAI API** exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. Non-OpenAI models may also be exposed in this route.
4850

49-
:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
51+
To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
5052

51-
### Routing
53+
## Using Azure AI inference endpoint
5254

5355
The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
5456

@@ -58,32 +60,24 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
5860

5961
[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
6062

61-
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
62-
63-
> [!TIP]
64-
> Deployment routing isn't case sensitive.
63+
For a chat model, you can create a request as follows:
6564

66-
### SDKs
67-
68-
The Foundry Models endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
69-
70-
## Azure OpenAI inference endpoint
71-
72-
Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
65+
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
7366

74-
Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
67+
If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
7568

76-
:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
69+
## Key-less authentication
7770

78-
Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
71+
Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
7972

80-
> [!IMPORTANT]
81-
> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
73+
To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
8274

83-
### SDKs
75+
[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
8476

85-
The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details.
77+
## Limitations
8678

79+
* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
80+
* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
8781

8882
## Next steps
8983

articles/ai-foundry/model-inference/how-to/inference.md

Lines changed: 21 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: How to use the Azure AI Foundry Models inference endpoint to consume models
2+
title: How to use the Azure AI Foundry Models inference endpoints to consume models
33
titleSuffix: Azure AI Foundry
44
description: Learn how to use the Azure AI Foundry Models inference endpoint to consume models
55
manager: scottpolly
@@ -12,27 +12,23 @@ ms.author: mopeakande
1212
ms.reviewer: fasantia
1313
---
1414

15-
# Use the Azure AI Foundry Models inference endpoints
15+
# Use Foundry Models
1616

1717
Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
1818

1919
This article explains how to use the inference endpoint to invoke them.
2020

21-
## Endpoints
21+
There are two different APIs to use models in Azure AI Foundry Models:
2222

23-
Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
23+
## Models inference endpoint
2424

25-
> [!div class="checklist"]
26-
> * Foundry Models endpoint
27-
> * Azure OpenAI endpoint
25+
The models inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
2826

29-
The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md).
27+
* Text embeddings
28+
* Image embeddings
29+
* Chat completions
3030

31-
**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
32-
33-
To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
34-
35-
## Using the routing capability in the Foundry Models endpoint
31+
### Routing
3632

3733
The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
3834

@@ -42,24 +38,26 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
4238

4339
[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
4440

45-
For a chat model, you can create a request as follows:
46-
4741
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
4842

49-
If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
43+
> [!TIP]
44+
> Deployment routing isn't case sensitive.
45+
46+
47+
## Azure OpenAI inference endpoint
48+
49+
Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.
5050

51-
## Key-less authentication
51+
Azure OpenAI endpoints (usually with the form `https://<resource-name>.openai.azure.com`) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
5252

53-
Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
53+
:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
5454

55-
To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
55+
Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
5656

57-
[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
57+
[!INCLUDE [code-create-openai-client](../includes/code-create-openai-client.md)]
5858

59-
## Limitations
59+
[!INCLUDE [code-create-openai-chat-completion](../includes/code-create-openai-chat-completion.md)]
6060

61-
* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
62-
* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
6361

6462
## Next steps
6563

articles/ai-foundry/model-inference/how-to/use-chat-completions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to generate chat completions with Azure AI Foundry Models
55
manager: scottpolly
66
author: msakande
77
reviewer: santiagxf
8-
ms.service: azure-ai-model-../includes/use-chat-completions
8+
ms.service: azure-ai-model-inference
99
ms.topic: how-to
1010
ms.date: 1/21/2025
1111
ms.author: mopeakande
@@ -57,4 +57,4 @@ zone_pivot_groups: azure-ai-inference-samples
5757
* [Use embeddings models](use-embeddings.md)
5858
* [Use image embeddings models](use-image-embeddings.md)
5959
* [Use reasoning models](use-chat-reasoning.md)
60-
* [Azure AI Foundry Models API](.././reference/reference-model-../includes/use-chat-completions-api.md)
60+
* [Azure AI Model Inference API](.././reference/reference-model-inference-api.md)

articles/ai-foundry/model-inference/how-to/use-openai.md

Whitespace-only changes.

articles/ai-foundry/model-inference/includes/code-create-chat-completion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({
3838
}
3939
});
4040

41-
console.log(response.choices[0].message.content)
41+
console.log(response.body.choices[0].message.content)
4242
```
4343

4444
# [C#](#tab/csharp)
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
manager: nitinme
3+
ms.service: azure-ai-model-inference
4+
ms.topic: include
5+
ms.date: 1/21/2025
6+
ms.author: fasantia
7+
author: santiagxf
8+
---
9+
10+
# [Python](#tab/python)
11+
12+
```python
13+
response = client.chat.completions.create(
14+
model="deepseek-v3-0324", # Replace with your model dpeloyment name.
15+
messages=[
16+
{"role": "system", "content": "You are a helpful assistant."},
17+
{"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
18+
]
19+
)
20+
21+
print(response.model_dump_json(indent=2)
22+
```
23+
24+
# [JavaScript](#tab/javascript)
25+
26+
```javascript
27+
var messages = [
28+
{ role: "system", content: "You are a helpful assistant" },
29+
{ role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
30+
];
31+
32+
const response = await client.chat.completions.create({ messages, model: "deepseek-v3-0324" });
33+
34+
console.log(response.choices[0].message.content)
35+
```
36+
37+
# [C#](#tab/csharp)
38+
39+
```csharp
40+
ChatCompletion response = chatClient.CompleteChat(
41+
[
42+
new SystemChatMessage("You are a helpful assistant."),
43+
new UserChatMessage("Explain Riemann's conjecture in 1 paragraph"),
44+
]);
45+
46+
Console.WriteLine($"{response.Role}: {response.Content[0].Text}");
47+
```
48+
49+
# [Java](#tab/java)
50+
51+
```java
52+
List<ChatRequestMessage> chatMessages = new ArrayList<>();
53+
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
54+
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));
55+
56+
ChatCompletions chatCompletions = client.getChatCompletions("deepseek-v3-0324",
57+
new ChatCompletionsOptions(chatMessages));
58+
59+
System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreatedAt());
60+
for (ChatChoice choice : chatCompletions.getChoices()) {
61+
ChatResponseMessage message = choice.getMessage();
62+
System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
63+
System.out.println("Message:");
64+
System.out.println(message.getContent());
65+
}
66+
```
67+
68+
Here, `deepseek-v3-0324` is the name of a model deployment in the Azure AI Foundry resource.
69+
70+
# [REST](#tab/rest)
71+
72+
__Request__
73+
74+
```HTTP/1.1
75+
POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
76+
api-key: <api-key>
77+
Content-Type: application/json
78+
```
79+
80+
```JSON
81+
{
82+
"messages": [
83+
{
84+
"role": "system",
85+
"content": "You are a helpful assistant"
86+
},
87+
{
88+
"role": "user",
89+
"content": "Explain Riemann's conjecture in 1 paragraph"
90+
}
91+
]
92+
}
93+
```
94+
95+
Here, `deepseek-v3-0324` is the name of a model deployment in the Azure AI Foundry resource.
96+
97+
---

0 commit comments

Comments
 (0)