|
| 1 | +--- |
| 2 | +title: Develop applications with Semantic Kernel and Azure AI Foundry |
| 3 | +titleSuffix: Azure AI Foundry |
| 4 | +description: Develop applications with Semantic Kernel and Azure AI Foundry. |
| 5 | +author: lgayhardt |
| 6 | +ms.author: lagayhar |
| 7 | +ms.reviewer: taochen |
| 8 | +ms.date: 12/04/2024 |
| 9 | +ms.topic: how-to |
| 10 | +ms.service: azure-ai-studio |
| 11 | +manager: scottpolly |
| 12 | +--- |
| 13 | + |
| 14 | +# Develop applications with Semantic Kernel and Azure AI Foundry |
| 15 | + |
| 16 | +In this article, you learn how to use [Semantic Kernel](/semantic-kernel/overview/) with models deployed from the Azure AI model catalog in Azure AI Foundry portal. |
| 17 | + |
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +- An [Azure subscription](https://azure.microsoft.com). |
| 21 | +- An Azure AI project as explained at [Create a project in Azure AI Foundry portal](../create-projects.md). |
| 22 | +- A model supporting the [Azure AI model inference API](../../reference/reference-model-inference-api.md?tabs=python) deployed. In this example, we use a `Mistral-Large` deployment, but use any model of your preference. For using embeddings capabilities in LlamaIndex, you need an embedding model like `cohere-embed-v3-multilingual`. |
| 23 | + |
| 24 | + - You can follow the instructions at [Deploy models as serverless APIs](../deploy-models-serverless.md). |
| 25 | + |
| 26 | +- Python **3.10** or later installed, including pip. |
| 27 | +- Semantic Kernel installed. You can do it with: |
| 28 | + |
| 29 | + ```bash |
| 30 | + pip install semantic-kernel |
| 31 | + ``` |
| 32 | +- In this example, we are working with the Azure AI model inference API, hence we install the relevant Azure dependencies. You can do it with: |
| 33 | + ```bash |
| 34 | + pip install semantic-kernel[azure] |
| 35 | + ``` |
| 36 | + |
| 37 | +## Configure the environment |
| 38 | + |
| 39 | +To use LLMs deployed in Azure AI Foundry portal, you need the endpoint and credentials to connect to it. Follow these steps to get the information you need from the model you want to use: |
| 40 | + |
| 41 | +1. Go to the [Azure AI Foundry portal](https://ai.azure.com/). |
| 42 | +1. Open the project where the model is deployed, if it isn't already open. |
| 43 | +1. Go to **Models + endpoints** and select the model you deployed as indicated in the prerequisites. |
| 44 | +1. Copy the endpoint URL and the key. |
| 45 | +
|
| 46 | + :::image type="content" source="../../media/how-to/inference/serverless-endpoint-url-keys.png" alt-text="Screenshot of the option to copy endpoint URI and keys from an endpoint." lightbox="../../media/how-to/inference/serverless-endpoint-url-keys.png"::: |
| 47 | +
|
| 48 | + > [!TIP] |
| 49 | + > If your model was deployed with Microsoft Entra ID support, you don't need a key. |
| 50 | + |
| 51 | +In this scenario, we placed both the endpoint URL and key in the following environment variables: |
| 52 | + |
| 53 | +```bash |
| 54 | +export AZURE_AI_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>" |
| 55 | +export AZURE_AI_INFERENCE_API_KEY="<your-key-goes-here>" |
| 56 | +``` |
| 57 | + |
| 58 | +Once configured, create a client to connect to the endpoint: |
| 59 | + |
| 60 | +```python |
| 61 | +from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion |
| 62 | +
|
| 63 | +chat_completion_service = AzureAIInferenceChatCompletion(ai_model_id="<deployment-name>") |
| 64 | +``` |
| 65 | + |
| 66 | +> [!TIP] |
| 67 | +> The client automatically reads the environment variables `AZURE_AI_INFERENCE_ENDPOINT` and `AZURE_AI_INFERENCE_API_KEY` to connect to the model. However, you can also pass the endpoint and key directly to the client via the `endpoint` and `api_key` parameters on the constructor. |
| 68 | + |
| 69 | +Alternatively, if your endpoint support Microsoft Entra ID, you can use the following code to create the client: |
| 70 | + |
| 71 | +```bash |
| 72 | +export AZURE_AI_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>" |
| 73 | +``` |
| 74 | + |
| 75 | +```python |
| 76 | +from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion |
| 77 | +
|
| 78 | +chat_completion_service = AzureAIInferenceChatCompletion(ai_model_id="<deployment-name>") |
| 79 | +``` |
| 80 | + |
| 81 | +> [!NOTE] |
| 82 | +> When using Microsoft Entra ID, make sure that the endpoint was deployed with that authentication method and that you have the required permissions to invoke it. |
| 83 | + |
| 84 | +### Azure OpenAI models |
| 85 | + |
| 86 | +If you're using an Azure OpenAI model, you can use the following code to create the client: |
| 87 | +
|
| 88 | +```python |
| 89 | +from azure.ai.inference.aio import ChatCompletionsClient |
| 90 | +from azure.identity.aio import DefaultAzureCredential |
| 91 | +
|
| 92 | +from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion |
| 93 | +
|
| 94 | +chat_completion_service = AzureAIInferenceChatCompletion( |
| 95 | + ai_model_id="<deployment-name>", |
| 96 | + client=ChatCompletionsClient( |
| 97 | + endpoint=f"{str(<your-azure-open-ai-endpoint>).strip('/')}/openai/deployments/{<deployment_name>}", |
| 98 | + credential=DefaultAzureCredential(), |
| 99 | + credential_scopes=["https://cognitiveservices.azure.com/.default"], |
| 100 | + ), |
| 101 | +) |
| 102 | +``` |
| 103 | +
|
| 104 | +## Inference parameters |
| 105 | +
|
| 106 | +You can configure how inference is performed by using the `AzureAIInferenceChatPromptExecutionSettings` class: |
| 107 | +
|
| 108 | +```python |
| 109 | +from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatPromptExecutionSettings |
| 110 | +
|
| 111 | +execution_settings = AzureAIInferenceChatPromptExecutionSettings( |
| 112 | + max_tokens=100, |
| 113 | + temperature=0.5, |
| 114 | + top_p=0.9, |
| 115 | + # extra_parameters={...}, # model-specific parameters |
| 116 | +) |
| 117 | +``` |
| 118 | +
|
| 119 | +## Calling the service |
| 120 | +
|
| 121 | +Let's first call the chat completion service with a simple chat history: |
| 122 | + |
| 123 | +> [!TIP] |
| 124 | +> Semantic Kernel is an asynchronous library, so you need to use the asyncio library to run the code. |
| 125 | +> |
| 126 | +> ```python |
| 127 | +> import asyncio |
| 128 | +> |
| 129 | +> async def main(): |
| 130 | +> ... |
| 131 | +> |
| 132 | +> if __name__ == "__main__": |
| 133 | +> asyncio.run(main()) |
| 134 | +> ``` |
| 135 | +
|
| 136 | +```python |
| 137 | +from semantic_kernel.contents.chat_history import ChatHistory |
| 138 | +
|
| 139 | +chat_history = ChatHistory() |
| 140 | +chat_history.add_user_message("Hello, how are you?") |
| 141 | +
|
| 142 | +response = await chat_completion_service.get_chat_message_content( |
| 143 | + chat_history=chat_history, |
| 144 | + settings=execution_settings, |
| 145 | +) |
| 146 | +print(response) |
| 147 | +``` |
| 148 | +
|
| 149 | +Alternatively, you can stream the response from the service: |
| 150 | +
|
| 151 | +```python |
| 152 | +chat_history = ChatHistory() |
| 153 | +chat_history.add_user_message("Hello, how are you?") |
| 154 | +
|
| 155 | +response = chat_completion_service.get_streaming_chat_message_content( |
| 156 | + chat_history=chat_history, |
| 157 | + settings=execution_settings, |
| 158 | +) |
| 159 | +
|
| 160 | +chunks = [] |
| 161 | +async for chunk in response: |
| 162 | + chunks.append(chunk) |
| 163 | + print(chunk, end="") |
| 164 | +
|
| 165 | +full_response = sum(chunks[1:], chunks[0]) |
| 166 | +``` |
| 167 | +
|
| 168 | +### Create a long-running conversation |
| 169 | +
|
| 170 | +You can create a long-running conversation by using a loop: |
| 171 | +
|
| 172 | +```python |
| 173 | +while True: |
| 174 | + response = await chat_completion_service.get_chat_message_content( |
| 175 | + chat_history=chat_history, |
| 176 | + settings=execution_settings, |
| 177 | + ) |
| 178 | + print(response) |
| 179 | + chat_history.add_message(response) |
| 180 | + chat_history.add_user_message(user_input = input("User:> ")) |
| 181 | +``` |
| 182 | +
|
| 183 | +If you're streaming the response, you can use the following code: |
| 184 | +
|
| 185 | +```python |
| 186 | +while True: |
| 187 | + response = chat_completion_service.get_streaming_chat_message_content( |
| 188 | + chat_history=chat_history, |
| 189 | + settings=execution_settings, |
| 190 | + ) |
| 191 | +
|
| 192 | + chunks = [] |
| 193 | + async for chunk in response: |
| 194 | + chunks.append(chunk) |
| 195 | + print(chunk, end="") |
| 196 | +
|
| 197 | + full_response = sum(chunks[1:], chunks[0]) |
| 198 | + chat_history.add_message(full_response) |
| 199 | + chat_history.add_user_message(user_input = input("User:> ")) |
| 200 | +``` |
| 201 | +
|
| 202 | +## Use embeddings models |
| 203 | +
|
| 204 | +Configure your environment similarly to the previous steps, but use the `AzureAIInferenceEmbeddings` class: |
| 205 | +
|
| 206 | +```python |
| 207 | +from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceTextEmbedding |
| 208 | +
|
| 209 | +embedding_generation_service = AzureAIInferenceTextEmbedding(ai_model_id="<deployment-name>") |
| 210 | +``` |
| 211 | +
|
| 212 | +The following code shows how to get embeddings from the service: |
| 213 | +
|
| 214 | +```python |
| 215 | +embeddings = await embedding_generation_service.generate_embeddings( |
| 216 | + texts=["My favorite color is blue.", "I love to eat pizza."], |
| 217 | +) |
| 218 | +
|
| 219 | +for embedding in embeddings: |
| 220 | + print(embedding) |
| 221 | +``` |
| 222 | +
|
| 223 | +## Related content |
| 224 | +
|
| 225 | +- [How to get started with Azure AI SDKs](sdk-overview.md) |
| 226 | +- [Reference for Semantic Kernel model integration](/semantic-kernel/concepts/ai-services/) |
0 commit comments