|
| 1 | +--- |
| 2 | +title: How to use the Azure AI model inference endpoint to consume models |
| 3 | +titleSuffix: Azure AI Foundry |
| 4 | +description: Learn how to use the Azure AI model inference endpoint to consume models |
| 5 | +manager: scottpolly |
| 6 | +author: msakande |
| 7 | +reviewer: santiagxf |
| 8 | +ms.service: azure-ai-model-inference |
| 9 | +ms.topic: how-to |
| 10 | +ms.date: 1/21/2025 |
| 11 | +ms.author: mopeakande |
| 12 | +ms.reviewer: fasantia |
| 13 | +--- |
| 14 | + |
| 15 | +# Use the Azure AI model inference endpoint to consume models |
| 16 | + |
| 17 | +Azure AI model inference in Azure AI services allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code. |
| 18 | + |
| 19 | +This article explains how to use the inference endpoint to invoke them. |
| 20 | + |
| 21 | +## Endpoints |
| 22 | + |
| 23 | +Azure AI services expose multiple endpoints depending on the type of work you're looking for: |
| 24 | + |
| 25 | +> [!div class="checklist"] |
| 26 | +> * Azure AI model inference endpoint |
| 27 | +> * Azure OpenAI endpoint |
| 28 | +
|
| 29 | +The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md). |
| 30 | + |
| 31 | +**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API. This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. |
| 32 | + |
| 33 | +To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI service documentation](../../../ai-services/openai/overview.md). |
| 34 | + |
| 35 | +## Using the routing capability in the Azure AI model inference endpoint |
| 36 | + |
| 37 | +The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed. |
| 38 | + |
| 39 | +:::image type="content" source="../media/endpoint/endpoint-routing.png" alt-text="An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request." lightbox="../media/endpoint/endpoint-routing.png"::: |
| 40 | + |
| 41 | +For example, if you create a deployment named `Mistral-large`, then such deployment can be invoked as: |
| 42 | + |
| 43 | +[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)] |
| 44 | + |
| 45 | +For a chat model, you can create a request as follows: |
| 46 | + |
| 47 | +[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)] |
| 48 | + |
| 49 | +If you specify a model name that doesn't match any given model deployment, you get an error that the model doesn't exist. You can control which models are available for users by creating model deployments as explained at [add and configure model deployments](create-model-deployments.md). |
| 50 | + |
| 51 | +## Limitations |
| 52 | + |
| 53 | +* Azure OpenAI Batch can't be used with the Azure AI model inference endpoint. You have to use the dedicated deployment URL as explained at [Batch API support in Azure OpenAI documentation](../../../ai-services/openai/how-to/batch.md#api-support). |
| 54 | +* Real-time API isn't supported in the inference endpoint. Use the dedicated deployment URL. |
| 55 | + |
| 56 | +## Next steps |
| 57 | + |
| 58 | +* [Use embedding models](use-embeddings.md) |
| 59 | +* [Use chat completion models](use-chat-completions.md) |
0 commit comments