|
| 1 | +--- |
| 2 | +title: Use the Azure AI model inference endpoint |
| 3 | +titleSuffix: Azure AI studio |
| 4 | +description: Learn about to use the Azure AI model inference endpoint and how to configure it. |
| 5 | +ms.service: azure-ai-studio |
| 6 | +ms.topic: conceptual |
| 7 | +author: sdgilley |
| 8 | +manager: scottpolly |
| 9 | +ms.date: 10/24/2024 |
| 10 | +ms.author: sgilley |
| 11 | +ms.reviewer: fasantia |
| 12 | +ms.custom: github-universe-2024 |
| 13 | +--- |
| 14 | + |
| 15 | +# Use the Azure AI model inference endpoint |
| 16 | + |
| 17 | +Azure AI inference service in Azure AI services allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code. |
| 18 | + |
| 19 | +The article explains how models are organized inside of the service and how to use the inference endpoint to invoke them. |
| 20 | + |
| 21 | +## Deployments |
| 22 | + |
| 23 | +Azure AI model inference service makes models available using the **deployment** concept. **Deployments** are a way to give a model a name under certain configurations. Then, you can invoke such model configuration by indicating its name on your requests. |
| 24 | + |
| 25 | +Deployments capture: |
| 26 | + |
| 27 | +> [!div class="checklist"] |
| 28 | +> * A model name |
| 29 | +> * A model version |
| 30 | +> * A provisioning/capacity type<sup>1</sup> |
| 31 | +> * A content filtering configuration<sup>1</sup> |
| 32 | +> * A rate limiting configuration<sup>1</sup> |
| 33 | +
|
| 34 | +<sup>1</sup> Configurations may vary depending on the model you have selected. |
| 35 | + |
| 36 | +An Azure AI services resource can have as many model deployments as needed and they don't incur in cost unless inference is performed for those models. Deployments are Azure resources and hence they are subject to Azure policies. |
| 37 | + |
| 38 | +To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md). |
| 39 | + |
| 40 | +## Azure AI inference endpoint |
| 41 | + |
| 42 | +The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../reference/reference-model-inference-api.md) which is supported by all the models in Azure AI model inference service. |
| 43 | + |
| 44 | +You can see the endpoint URL and credentials in the **Overview** section. The endpoint usually has the form `https://<resource-name>.services.ai.azure.com/models`: |
| 45 | + |
| 46 | +:::image type="content" source="../../media/ai-services/overview/overview-endpoint-and-key.png" alt-text="A screenshot showing how to get the URL and key associated with the resource." lightbox="../../media/ai-services/overview/overview-endpoint-and-key.png"::: |
| 47 | + |
| 48 | +### Routing |
| 49 | + |
| 50 | +The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed. |
| 51 | + |
| 52 | +:::image type="content" source="../../media/ai-services/endpoint/endpoint-routing.png" alt-text="An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request." lightbox="../../media/ai-services/endpoint/endpoint-routing.png"::: |
| 53 | + |
| 54 | +For example, if you create a deployment named `Mistral-large`, then such deployment can be invoked as: |
| 55 | + |
| 56 | +[!INCLUDE [code-create-chat-completion](../../includes/ai-services/code-create-chat-completion.md)] |
| 57 | + |
| 58 | +> [!TIP] |
| 59 | +> Deployment routing is not case sensitive. |
| 60 | +
|
| 61 | +### Supported languages and SDKs |
| 62 | + |
| 63 | +All models deployed in Azure AI model inference service support the [Azure AI model inference API](https://aka.ms/aistudio/modelinference) and its associated family of SDKs, which are available in the following languages: |
| 64 | + |
| 65 | +| Language | Documentation | Package | Examples | |
| 66 | +|------------|---------|-----|-------| |
| 67 | +| C# | [Reference](https://aka.ms/azsdk/azure-ai-inference/csharp/reference) | [azure-ai-inference (NuGet)](https://www.nuget.org/packages/Azure.AI.Inference/) | [C# examples](https://aka.ms/azsdk/azure-ai-inference/csharp/samples) | |
| 68 | +| Java | [Reference](https://aka.ms/azsdk/azure-ai-inference/java/reference) | [azure-ai-inference (Maven)](https://central.sonatype.com/artifact/com.azure/azure-ai-inference/) | [Java examples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/ai/azure-ai-inference/src/samples) | |
| 69 | +| JavaScript | [Reference](https://aka.ms/AAp1kxa) | [@azure/ai-inference (npm)](https://www.npmjs.com/package/@azure/ai-inference) | [JavaScript examples](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/ai/ai-inference-rest/samples) | |
| 70 | +| Python | [Reference](https://aka.ms/azsdk/azure-ai-inference/python/reference) | [azure-ai-inference (PyPi)](https://pypi.org/project/azure-ai-inference/) | [Python examples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) | |
| 71 | + |
| 72 | +## Azure OpenAI inference endpoint |
| 73 | + |
| 74 | +Azure OpenAI models also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. |
| 75 | + |
| 76 | +Each OpenAI model deployment has its own URL associated with such deployment under the Azure OpenAI inference endpoint. However, the same authentication mechanism can be used to consume it. URLs are usually in the form of `https://<resource-name>.openai.azure.com/openai/deployments/<model-deployment-name>`. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md) |
| 77 | + |
| 78 | +:::image type="content" source="../../media/ai-services/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../../media/ai-services/endpoint/endpoint-openai.png"::: |
| 79 | + |
| 80 | +Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`. |
| 81 | + |
| 82 | +> [!IMPORTANT] |
| 83 | +> There is no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment. |
| 84 | +
|
| 85 | +### Supported languages and SDKs |
| 86 | + |
| 87 | +The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages: |
| 88 | + |
| 89 | +| Language | Source code | Package | Examples | |
| 90 | +|------------|---------|-----|-------| |
| 91 | +| C# | [Source code](https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/openai/Azure.AI.OpenAI) | [Azure.AI.OpenAI (NuGet)](https://www.nuget.org/packages/Azure.AI.OpenAI/) | [C# examples](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/openai/Azure.AI.OpenAI/tests/Samples) | |
| 92 | +| Go | [Source code](https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/ai/azopenai) | [azopenai (Go)](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai)| [Go examples](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai#pkg-examples) | |
| 93 | +| Java | [Source code](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai) | [azure-ai-openai (Maven)](https://central.sonatype.com/artifact/com.azure/azure-ai-openai/) | [Java examples](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/openai/azure-ai-openai/src/samples) | |
| 94 | +| JavaScript | [Source code](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/openai/openai) | [@azure/openai (npm)](https://www.npmjs.com/package/@azure/openai) | [JavaScript examples](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/openai/openai/samples/) | |
| 95 | +| Python | [Source code](https://github.com/openai/openai-python) | [openai (PyPi)](https://pypi.org/project/openai/) | [Python examples](https://github.com/openai/openai-cookbook) | |
| 96 | + |
| 97 | +## Next steps |
| 98 | + |
| 99 | +- [Deployment types](deployment-types.md) |
0 commit comments