Merge pull request #284 from santiagxf/santiagxf/llamaindex-sdk

denrea · web-flow · commit ac6fa20f3bb2 · 2024-09-19T12:23:08.000-07:00
LlamaIndex integration and Inference SDK
diff --git a/articles/ai-studio/how-to/develop/llama-index.md b/articles/ai-studio/how-to/develop/llama-index.md
@@ -0,0 +1,202 @@
+---
+title: Develop application with LlamaIndex and Azure AI studio
+titleSuffix: Azure AI Studio
+description: This article explains how to use LlamaIndex with models deployed in Azure AI studio to build advance intelligent applications.
+manager: nitinme
+ms.service: azure-ai-studio
+ms.topic: how-to
+ms.date: 9/14/2024
+ms.reviewer: fasantia
+ms.author: eur
+author: eric-urban
+---
+
+# Develop applications with LlamaIndex and Azure AI studio
+
+In this article, you learn how to use [LlamaIndex](https://github.com/run-llama/llama_index) with models deployed from the Azure AI model catalog deployed to Azure AI studio.
+
+Models deployed to Azure AI studio can be used with LlamaIndex in two ways:
+
+- **Using the Azure AI model inference API:** All models deployed to Azure AI studio support the [Azure AI model inference API](../../reference/reference-model-inference-api.md), which offers a common set of functionalities that can be used for most of the models in the catalog. The benefit of this API is that, since it's the same for all the models, changing from one to another is as simple as changing the model deployment being use. No further changes are required in the code. When working with LlamaIndex, install the extensions `llama-index-llms-azure-inference` and `llama-index-embeddings-azure-inference`.
+
+- **Using the model's provider specific API:** Some models, like OpenAI, Cohere, or Mistral, offer their own set of APIs and extensions for LlamaIndex. Those extensions may include specific functionalities that the model support and hence are suitable if you want to exploit them. When working with `llama-index`, install the extension specific for the model you want to use, like `llama-index-llms-openai` or `llama-index-llms-cohere`.
+
+In this example, we are working with the **Azure AI model inference API**.
+
+## Prerequisites
+
+To run this tutorial, you need:
+
+1. An [Azure subscription](https://azure.microsoft.com).
+2. An Azure AI hub resource as explained at [How to create and manage an Azure AI Studio hub](../create-azure-ai-resource.md).
+3. A model supporting the [Azure AI model inference API](https://aka.ms/azureai/modelinference) deployed. In this example, we use a `Mistral-Large` deployment, but use any model of your preference. For using embeddings capabilities in LlamaIndex, you need an embedding model like `cohere-embed-v3-multilingual`. 
+
+    * You can follow the instructions at [Deploy models as serverless APIs](../deploy-models-serverless.md).
+
+4. Python 3.8 or later installed, including pip.
+5. LlamaIndex installed. You can do it with:
+
+    ```bash
+    pip install llama-index
+    ```
+
+6. In this example, we are working with the Azure AI model inference API, hence we install the following packages:
+
+    ```bash
+    pip install -U llama-index-llms-azure-inference
+    pip install -U llama-index-embeddings-azure-inference
+    ``` 
+
+## Configure the environment
+
+To use LLMs deployed in Azure AI studio, you need the endpoint and credentials to connect to it. The parameter `model_name` is not required for endpoints serving a single model, like Managed Online Endpoints. Follow these steps to get the information you need from the model you want to use:
+
+1. Go to the [Azure AI studio](https://ai.azure.com/).
+2. Go to deployments and select the model you deployed as indicated in the prerequisites.
+3. Copy the endpoint URL and the key.
+
+    :::image type="content" source="../../media/how-to/inference/serverless-endpoint-url-keys.png" alt-text="Screenshot of the option to copy endpoint URI and keys from an endpoint." lightbox="../../media/how-to/inference/serverless-endpoint-url-keys.png":::
+    
+    > [!TIP]
+    > If your model was deployed with Microsoft Entra ID support, you don't need a key.
+
+In this scenario, we placed both the endpoint URL and key in the following environment variables:
+
+```bash
+export AZURE_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
+export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"
+```
+
+Once configured, create a client to connect to the endpoint:
+
+```python
+import os
+from llama_index.llms.azure_inference import AzureAICompletionsModel
+
+llm = AzureAICompletionsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
+)
+```
+
+Alternatively, if your endpoint support Microsoft Entra ID, you can use the following code to create the client:
+
+```python
+from azure.identity import DefaultAzureCredential
+
+llm = AzureAICompletionsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=DefaultAzureCredential(),
+)
+```
+
+> [!NOTE]
+> > Note: When using Microsoft Entra ID, make sure that the endpoint was deployed with that authentication method and that you have the required permissions to invoke it.
+
+If you are planning to use asynchronous calling, it's a best practice to use the asynchronous version for the credentials:
+
+```python
+from azure.identity.aio import (
+    DefaultAzureCredential as DefaultAzureCredentialAsync,
+)
+
+llm = AzureAICompletionsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=DefaultAzureCredentialAsync(),
+)
+```
+
+### Inference parameters
+
+You can configure how inference in performed for all the operations that are using this client by setting extra parameters. This helps avoid indicating them on each call you make to the model.
+
+```python
+llm = AzureAICompletionsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
+    temperature=0.0,
+    model_kwargs={"top_p": 1.0},
+)
+```
+
+Parameters not supported in the Azure AI model inference API ([reference](../../reference/reference-model-inference-chat-completions.md)) but available in the underlying model, you can use the `model_extras` argument. In the following example, the parameter `safe_prompt`, only available for Mistral models, is being passed.
+
+```python
+llm = AzureAICompletionsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=os.environ["AZURE_INFERENCE_CREDENTIAL"],
+    temperature=0.0,
+    model_kwargs={"model_extras": {"safe_prompt": True}},
+)
+```
+
+## Use LLMs models
+
+Use the `chat` endpoint for chat instruction models. The `complete` method is still available for model of type `chat-completions`. On those cases, your input text is converted to a message with `role="user"`.
+
+```python
+from llama_index.core.llms import ChatMessage
+
+messages = [
+    ChatMessage(
+        role="system", content="You are a pirate with colorful personality."
+    ),
+    ChatMessage(role="user", content="Hello"),
+]
+
+response = llm.chat(messages)
+print(response)
+```
+
+You can stream the outputs also:
+
+```python
+response = llm.stream_chat(messages)
+for r in response:
+    print(r.delta, end="")
+```
+
+## Use embeddings models
+
+In the same way you create an LLM client, you can connect to an embedding model. In the following example, we are setting again the environment variable to now point to an embeddings model:
+
+```bash
+export AZURE_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
+export AZURE_INFERENCE_CREDENTIAL="<your-key-goes-here>"
+```
+
+Then create the client:
+
+```python
+from llama_index.embeddings.azure_inference import AzureAIEmbeddingsModel
+
+embed_model = AzureAIEmbeddingsModel(
+    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
+    credential=os.environ['AZURE_INFERENCE_CREDENTIAL'],
+)
+```
+
+## Configure the models used by your code
+
+You can use the LLM or embeddings model client individually in the code you develop with LlamaIndex or you can configure the entire session using the `Settings` options. Configuring the session has the advantage of all your code using the same models for all the operations.
+
+```python
+from llama_index.core import Settings
+
+Settings.llm = llm
+Settings.embed_model = embed_model
+```
+
+However, there are scenarios where you want to use a general model for most of the operations but a specific one for a given task. On those cases, it's useful to set the LLM or embedding model you are using for each LlamaIndex construct. In the following example, we set a specific model:
+
+```python
+from llama_index.core.evaluation import RelevancyEvaluator
+
+relevancy_evaluator = RelevancyEvaluator(llm=llm)
+```
+
+In general, you use a combination of both strategies.
+
+## Related content
+
+* [How to get started with Azure AI SDKs](sdk-overview.md)
diff --git a/articles/ai-studio/how-to/develop/sdk-overview.md b/articles/ai-studio/how-to/develop/sdk-overview.md
@@ -15,7 +15,7 @@ author: eric-urban
 
 # Overview of the Azure AI SDKs
 
-Microsoft offers a variety of packages that you can use for building generative AI applications in the cloud. In most applications, you need to use a combination of packages to manage and use various Azure services that provide AI functionality. We also offer integrations with open-source libraries like LangChain and mlflow for use with Azure. In this article we'll give an overview of the main services and SDKs you can use with Azure AI Studio.
+Microsoft offers a variety of packages that you can use for building generative AI applications in the cloud. In most applications, you need to use a combination of packages to manage and use various Azure services that provide AI functionality. We also offer integrations with open-source libraries like LangChain and MLflow for use with Azure. In this article we'll give an overview of the main services and SDKs you can use with Azure AI Studio.
 
 For building generative AI applications, we recommend using the following services and SDKs:
  * [Azure Machine Learning](/azure/machine-learning/overview-what-is-azure-machine-learning) for the hub and project infrastructure used in AI Studio to organize your work into projects, manage project artifacts (data, evaluation runs, traces), fine-tune & deploy models, and connect to external services and resources.
@@ -54,6 +54,9 @@ Azure AI services
 Prompt flow
  * [Prompt flow SDK](https://microsoft.github.io/promptflow/how-to-guides/quick-start.html)
 
+Agentic frameworks:
+* [LlamaIndex](llama-index.md)
+
 ## Related content
 
 - [Get started building a chat app using the prompt flow SDK](../../quickstarts/get-started-code.md)
diff --git a/articles/ai-studio/media/how-to/inference/serverless-endpoint-url-keys.png b/articles/ai-studio/media/how-to/inference/serverless-endpoint-url-keys.png
diff --git a/articles/ai-studio/toc.yml b/articles/ai-studio/toc.yml
@@ -252,6 +252,8 @@ items:
         href: how-to/develop/vscode.md
       - name: Start with an AI template
         href: how-to/develop/ai-template-get-started.md
+      - name: Develop with LlamaIndex and Azure AI studio
+        href: how-to/develop/llama-index.md
       - name: Trace your application with prompt flow
         href: how-to/develop/trace-local-sdk.md
         displayName: code,sdk