Merge pull request #284741 from santiagxf/santiagxf-patch-3

prmerger-automator[bot] · web-flow · commit 094f7385ee69 · 2024-08-15T04:15:06.000Z
Update llm-tool.md
diff --git a/articles/machine-learning/prompt-flow/tools-reference/llm-tool.md b/articles/machine-learning/prompt-flow/tools-reference/llm-tool.md
@@ -16,12 +16,12 @@ ms.date: 11/02/2023
 
 # LLM tool
 
-The large language model (LLM) tool in prompt flow enables you to take advantage of widely used large language models like [OpenAI](https://platform.openai.com/) or [Azure OpenAI Service](../../../cognitive-services/openai/overview.md) for natural language processing.
+The large language model (LLM) tool in prompt flow enables you to take advantage of widely used large language models like [OpenAI](https://platform.openai.com/), [Azure OpenAI Service](../../../cognitive-services/openai/overview.md), or any language model supported by the [Azure AI model inference API](https://aka.ms/azureai/modelinference) for natural language processing.
 
 Prompt flow provides a few different large language model APIs:
 
 - [Completion](https://platform.openai.com/docs/api-reference/completions): OpenAI's completion models generate text based on provided prompts.
-- [Chat](https://platform.openai.com/docs/api-reference/chat): OpenAI's chat models facilitate interactive conversations with text-based inputs and responses.
+- [Chat](https://platform.openai.com/docs/api-reference/chat): OpenAI's chat models and the [Azure AI](https://aka.ms/azureai/modelinference) chat models facilitate interactive conversations with text-based inputs and responses.
 
 > [!NOTE]
 > We removed the `embedding` option from the LLM tool API setting. You can use an embedding API with the [embedding tool](embedding-tool.md).
@@ -41,6 +41,11 @@ Create OpenAI resources:
 
     - Create Azure OpenAI resources with [these instructions](../../../ai-services/openai/how-to/create-resource.md).
 
+- **Models deployed to Serverless API endpoints**
+
+  - Select the model from the catalog you are interested in [and deploy it with a serverless API endpoint](../../how-to-deploy-models-serverless.md).
+  - To use models deployed to serverless API endpoints supported by the [Azure AI model inference API](https://aka.ms/azureai/modelinference), like Mistral, Cohere, Meta Llama, or Microsoft family of models (among others), you need to [create a connection in your project to your endpoint](../../how-to-connect-models-serverless.md?#create-a-serverless-api-endpoint-connection).
+
 ## Connections
 
 Set up connections to provisioned resources in prompt flow.
@@ -50,6 +55,7 @@ Set up connections to provisioned resources in prompt flow.
 | OpenAI      | Required | Required | -        | -           |
 | Azure OpenAI - API key| Required | Required | Required | Required    |
 | Azure OpenAI - Microsoft Entra ID| Required | - | - | Required    |
+| Serverless model | Requred | Required | - | - |
 
   > [!TIP]
   > - To use Microsoft Entra ID auth type for Azure OpenAI connection, you need assign either the `Cognitive Services OpenAI User` or `Cognitive Services OpenAI Contributor role` to user or user assigned managed identity.
@@ -83,7 +89,7 @@ The following sections show various inputs.
 | Name                   | Type        | Description                                                                                    | Required |
 |------------------------|-------------|------------------------------------------------------------------------------------------------|----------|
 | prompt                 | string      | Text prompt that the language model uses for a response.                                              | Yes      |
-| model, deployment_name | string      | Language model to use.                                                                      | Yes      |
+| model, deployment_name | string | Language model to use. This parameter is not required if the model is deployed to a serverless API endpoint. | Yes* |
 | max\_tokens            | integer     | Maximum number of tokens to generate in the response. Default is inf.                      | No       |
 | temperature            | float       | Randomness of the generated text. Default is 1.                                            | No       |
 | stop                   | list        | Stopping sequence for the generated text. Default is null.                                 | No       |
@@ -101,6 +107,6 @@ The following sections show various inputs.
 
 ## Use the LLM tool
 
-1. Set up and select the connections to OpenAI resources.
+1. Set up and select the connections to OpenAI resources or to a serverless API endpoint.
 1. Configure the large language model API and its parameters.
 1. Prepare the prompt with [guidance](prompt-tool.md#write-a-prompt).