Skip to content

Commit 97109fb

Browse files
authored
Merge pull request #4984 from santiagxf/santiagxf/openai
OpenAI API for Foundry Models
2 parents 3de5dad + 783fc26 commit 97109fb

36 files changed

+652
-305
lines changed

articles/ai-foundry/model-inference/concepts/endpoints.md

Lines changed: 22 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Endpoint for Azure AI Foundry Models
2+
title: Endpoints for Azure AI Foundry Models
33
titleSuffix: Azure AI Foundry
44
description: Learn about the Azure AI Foundry Models endpoint
55
author: santiagxf
@@ -11,7 +11,7 @@ ms.author: fasantia
1111
ms.custom: ignite-2024, github-universe-2024
1212
---
1313

14-
# Endpoint for Azure AI Foundry Models
14+
# Endpoints for Azure AI Foundry Models
1515

1616
Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
1717

@@ -36,19 +36,21 @@ An Azure AI Foundry resource can have as many model deployments as needed and th
3636

3737
To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
3838

39-
## Foundry Models inference endpoint
39+
## Endpoints
4040

41-
The Foundry Models inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
41+
Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
4242

43-
* Text embeddings
44-
* Image embeddings
45-
* Chat completions
43+
> [!div class="checklist"]
44+
> * Azure AI inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`)
45+
> * Azure OpenAI endpoint (usually with the form `https://<resource-name>.openai.azure.com`)
46+
47+
The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md).
4648

47-
You can see the endpoint URL and credentials in the **Overview** section:
49+
The **Azure OpenAI API** exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. Non-OpenAI models may also be exposed in this route.
4850

49-
:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
51+
To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
5052

51-
### Routing
53+
## Using Azure AI inference endpoint
5254

5355
The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
5456

@@ -58,32 +60,24 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
5860

5961
[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
6062

61-
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
62-
63-
> [!TIP]
64-
> Deployment routing isn't case sensitive.
63+
For a chat model, you can create a request as follows:
6564

66-
### SDKs
67-
68-
The Foundry Models endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
69-
70-
## Azure OpenAI inference endpoint
71-
72-
Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
65+
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
7366

74-
Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
67+
If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](../how-to/create-model-deployments.md).
7568

76-
:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
69+
## Key-less authentication
7770

78-
Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
71+
Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
7972

80-
> [!IMPORTANT]
81-
> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
73+
To use key-less authentication, [configure your resource and grant access to users](../how-to/configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
8274

83-
### SDKs
75+
[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
8476

85-
The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details.
77+
## Limitations
8678

79+
* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
80+
* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
8781

8882
## Next steps
8983

articles/ai-foundry/model-inference/faq.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ sections:
1717
- name: General
1818
questions:
1919
- question: |
20-
What's the difference between Azure OpenAI and Foundry Models?
20+
What's the difference between Azure OpenAI and Azure AI Foundry Models?
2121
answer: |
22-
Azure OpenAI gives customers access to advanced language models from OpenAI. Foundry Models extends such capability giving customers access to all the flagship models in Azure AI Foundry under the same service, endpoint, and credentials. It includes Azure OpenAI, Cohere, Mistral AI, Meta Llama, AI21 labs, etc. Customers can seamlessly switch between models without changing their code.
22+
Azure OpenAI gives customers access to advanced language models from OpenAI. Azure AI Foundry Models extends such capability giving customers access to all the flagship models in Azure AI Foundry under the same service, endpoint, and credentials. It includes Azure OpenAI, Cohere, Mistral AI, Meta Llama, AI21 labs, etc. Customers can seamlessly switch between models without changing their code.
2323
2424
Azure OpenAI is included in the Models Sold Directly by Azure family in Foundry Models.
2525
- question: |

articles/ai-foundry/model-inference/how-to/inference.md

Lines changed: 21 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: How to use the Azure AI Foundry Models inference endpoint to consume models
2+
title: How to use the Azure AI Foundry Models inference endpoints to consume models
33
titleSuffix: Azure AI Foundry
44
description: Learn how to use the Azure AI Foundry Models inference endpoint to consume models
55
manager: scottpolly
@@ -12,27 +12,19 @@ ms.author: mopeakande
1212
ms.reviewer: fasantia
1313
---
1414

15-
# Use the Azure AI Foundry Models inference endpoints
15+
# Use Foundry Models
1616

17-
Azure AI Foundry Models allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
17+
Once you have [deployed a model in Azure AI Foundry](create-model-deployments.md), you can consume its capabilities via Azure AI Foundry APIs. There are two different endpoints and APIs to use models in Azure AI Foundry Models.
1818

19-
This article explains how to use the inference endpoint to invoke them.
19+
## Models inference endpoint
2020

21-
## Endpoints
21+
The models inference endpoint (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI Model Inference API](.././reference/reference-model-inference-api.md) which all the models in Foundry Models support. It supports the following modalities:
2222

23-
Azure AI Foundry Services (formerly known Azure AI Services) expose multiple endpoints depending on the type of work you're looking for:
23+
* Text embeddings
24+
* Image embeddings
25+
* Chat completions
2426

25-
> [!div class="checklist"]
26-
> * Foundry Models endpoint
27-
> * Azure OpenAI endpoint
28-
29-
The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Foundry Models API](.././reference/reference-model-inference-api.md).
30-
31-
**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
32-
33-
To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI in Azure AI Foundry Models documentation](../../../ai-services/openai/overview.md).
34-
35-
## Using the routing capability in the Foundry Models endpoint
27+
### Routing
3628

3729
The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
3830

@@ -42,24 +34,26 @@ For example, if you create a deployment named `Mistral-large`, then such deploym
4234

4335
[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
4436

45-
For a chat model, you can create a request as follows:
46-
4737
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
4838

49-
If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md).
39+
> [!TIP]
40+
> Deployment routing isn't case sensitive.
41+
42+
43+
## Azure OpenAI inference endpoint
44+
45+
Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.
5046

51-
## Key-less authentication
47+
Azure OpenAI endpoints (usually with the form `https://<resource-name>.openai.azure.com`) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
5248

53-
Models deployed to Azure AI Foundry Models in Azure AI Services support key-less authorization using Microsoft Entra ID. Key-less authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes it a strong choice for organizations adopting secure and scalable identity management solutions.
49+
:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
5450

55-
To use key-less authentication, [configure your resource and grant access to users](configure-entra-id.md) to perform inference. Once configured, then you can authenticate as follows:
51+
Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
5652

57-
[!INCLUDE [code-create-chat-client-entra](../includes/code-create-chat-client-entra.md)]
53+
[!INCLUDE [code-create-openai-client](../includes/code-create-openai-client.md)]
5854

59-
## Limitations
55+
[!INCLUDE [code-create-openai-chat-completion](../includes/code-create-openai-chat-completion.md)]
6056

61-
* Azure OpenAI Batch can't be used with the Foundry Models endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support).
62-
* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL.
6357

6458
## Next steps
6559

articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ You can change this behavior and deploy both types of models to Azure AI Foundry
2121
Additionally, deploying models to Azure AI Foundry Models brings the extra benefits of:
2222

2323
> [!div class="checklist"]
24-
> * [Routing capability](../concepts/endpoints.md#routing).
24+
> * [Routing capability](inference.md#routing).
2525
> * [Custom content filters](../concepts/content-filter.md).
2626
> * Global capacity deployment type.
2727
> * [Key-less authentication](configure-entra-id.md) with role-based access control.
@@ -79,7 +79,7 @@ To configure the project to use the Foundry Models capability in Azure AI Foundr
7979
:::image type="content" source="../media/quickstart-ai-project/overview-endpoint-and-key.png" alt-text="Screenshot of the landing page for the project, highlighting the location of the connected resource and the associated inference endpoint." lightbox="../media/quickstart-ai-project/overview-endpoint-and-key.png":::
8080

8181
> [!TIP]
82-
> Each Azure AI Foundry Services resource has a single **Foundry Models endpoint** which can be used to access any model deployment on it. The same endpoint serves multiple models depending on which ones are configured. Learn about [how the endpoint works](../concepts/endpoints.md#azure-openai-inference-endpoint).
82+
> Each Azure AI Foundry Services resource has a single **Foundry Models endpoint** which can be used to access any model deployment on it. The same endpoint serves multiple models depending on which ones are configured. Learn about [how the endpoint works](inference.md#azure-openai-inference-endpoint).
8383
8484
5. Take note of the endpoint URL and credentials.
8585

@@ -136,7 +136,7 @@ Generate your first chat completion:
136136

137137
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
138138

139-
Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](../concepts/endpoints.md#routing) concept page to learn how Azure AI Services route deployments.
139+
Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](inference.md#routing) page to learn how Azure AI Foundry Models routes deployments.
140140

141141

142142
## Move from standard deployments to Foundry Models

articles/ai-foundry/model-inference/how-to/quickstart-github-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Generate your first chat completion:
7575

7676
[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
7777

78-
Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](../concepts/endpoints.md#routing) concept page to learn how Azure AI Services route deployments.
78+
Use the parameter `model="<deployment-name>` to route your request to this deployment. *Deployments work as an alias of a given model under certain configurations*. See [Routing](inference.md#routing) concept page to learn how Azure AI Services route deployments.
7979

8080
> [!IMPORTANT]
8181
> As opposite to GitHub Models where all the models are already configured, the Azure AI Services resource allows you to control which models are available in your endpoint and under which configuration. Add as many models as you plan to use before indicating them in the `model` parameter. Learn how to [add more models](create-model-deployments.md) to your resource.

articles/ai-foundry/model-inference/how-to/use-chat-completions.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,35 +20,41 @@ zone_pivot_groups: azure-ai-inference-samples
2020
::: zone pivot="programming-language-python"
2121

2222
[!INCLUDE [python](../includes/use-chat-completions/python.md)]
23+
2324
::: zone-end
2425

2526

2627
::: zone pivot="programming-language-javascript"
2728

2829
[!INCLUDE [javascript](../includes/use-chat-completions/javascript.md)]
30+
2931
::: zone-end
3032

3133

3234
::: zone pivot="programming-language-java"
3335

3436
[!INCLUDE [java](../includes/use-chat-completions/java.md)]
37+
3538
::: zone-end
3639

3740

3841
::: zone pivot="programming-language-csharp"
3942

4043
[!INCLUDE [csharp](../includes/use-chat-completions/csharp.md)]
44+
4145
::: zone-end
4246

4347

4448
::: zone pivot="programming-language-rest"
4549

4650
[!INCLUDE [rest](../includes/use-chat-completions/rest.md)]
51+
4752
::: zone-end
4853

54+
4955
## Related content
5056

5157
* [Use embeddings models](use-embeddings.md)
5258
* [Use image embeddings models](use-image-embeddings.md)
5359
* [Use reasoning models](use-chat-reasoning.md)
54-
* [Azure AI Foundry Models API](.././reference/reference-model-inference-api.md)
60+
* [Azure AI Model Inference API](.././reference/reference-model-inference-api.md)

articles/ai-foundry/model-inference/includes/code-create-chat-completion.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ var response = await client.path("/chat/completions").post({
3838
}
3939
});
4040

41-
console.log(response.choices[0].message.content)
41+
console.log(response.body.choices[0].message.content)
4242
```
4343

4444
# [C#](#tab/csharp)

articles/ai-foundry/model-inference/includes/code-create-chat-reasoning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ var response = await client.path("/chat/completions").post({
3636
}
3737
});
3838

39-
console.log(response.choices[0].message.content)
39+
console.log(response.body.choices[0].message.content)
4040
```
4141

4242
# [C#](#tab/csharp)

0 commit comments

Comments
 (0)