|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Online Evaluations: Evaluating in the Cloud on a Schedule\n", |
| 8 | + "\n", |
| 9 | + "## Objective\n", |
| 10 | + "\n", |
| 11 | + "This tutorial provides a step-by-step guide on how to evaluate data generated by LLMs online on a schedule. \n", |
| 12 | + "\n", |
| 13 | + "This tutorial uses the following Azure AI services:\n", |
| 14 | + "\n", |
| 15 | + "- [Azure AI Safety Evaluation](https://aka.ms/azureaistudiosafetyeval)\n", |
| 16 | + "- [azure-ai-evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk)\n", |
| 17 | + "\n", |
| 18 | + "## Time\n", |
| 19 | + "\n", |
| 20 | + "You should expect to spend 30 minutes running this sample. \n", |
| 21 | + "\n", |
| 22 | + "## About this example\n", |
| 23 | + "\n", |
| 24 | + "This example demonstrates the online evaluation of a LLM. It is important to have access to AzureOpenAI credentials and an AzureAI project. This example demonstrates: \n", |
| 25 | + "\n", |
| 26 | + "- Recurring, Online Evaluation (to be used to monitor LLMs once they are deployed)\n", |
| 27 | + "\n", |
| 28 | + "## Before you begin\n", |
| 29 | + "### Prerequesite\n", |
| 30 | + "- Configure resources to support Online Evaluation as per [Online Evaluation documentation](https://aka.ms/GenAIMonitoringDoc)" |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "code", |
| 35 | + "execution_count": null, |
| 36 | + "metadata": {}, |
| 37 | + "outputs": [], |
| 38 | + "source": [ |
| 39 | + "%pip install -U azure-identity\n", |
| 40 | + "%pip install -U azure-ai-project\n", |
| 41 | + "%pip install -U azure-ai-evaluation" |
| 42 | + ] |
| 43 | + }, |
| 44 | + { |
| 45 | + "cell_type": "code", |
| 46 | + "execution_count": null, |
| 47 | + "metadata": {}, |
| 48 | + "outputs": [], |
| 49 | + "source": [ |
| 50 | + "from azure.ai.project import AIProjectClient\n", |
| 51 | + "from azure.identity import DefaultAzureCredential\n", |
| 52 | + "from azure.ai.project.models import (\n", |
| 53 | + " ApplicationInsightsConfiguration,\n", |
| 54 | + " EvaluatorConfiguration,\n", |
| 55 | + " ConnectionType,\n", |
| 56 | + " EvaluationSchedule,\n", |
| 57 | + " RecurrenceTrigger,\n", |
| 58 | + ")\n", |
| 59 | + "from azure.ai.evaluation import F1ScoreEvaluator, ViolenceEvaluator" |
| 60 | + ] |
| 61 | + }, |
| 62 | + { |
| 63 | + "cell_type": "markdown", |
| 64 | + "metadata": {}, |
| 65 | + "source": [ |
| 66 | + "### Connect to your Azure Open AI deployment\n", |
| 67 | + "To evaluate your LLM-generated data remotely in the cloud, we must connect to your Azure Open AI deployment. This deployment must be a GPT model which supports `chat completion`, such as `gpt-4`. To see the proper value for `conn_str`, navigate to the connection string at the \"Project Overview\" page for your Azure AI project. " |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "code", |
| 72 | + "execution_count": null, |
| 73 | + "metadata": {}, |
| 74 | + "outputs": [], |
| 75 | + "source": [ |
| 76 | + "project_client = AIProjectClient.from_connection_string(\n", |
| 77 | + " credential=DefaultAzureCredential(),\n", |
| 78 | + " conn_str=\"<connection_string>\", # At the moment, it should be in the format \"<Region>.api.azureml.ms;<AzureSubscriptionId>;<ResourceGroup>;<HubName>\" Ex: eastus2.api.azureml.ms;xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxx;rg-sample;sample-project-eastus2\n", |
| 79 | + ")" |
| 80 | + ] |
| 81 | + }, |
| 82 | + { |
| 83 | + "cell_type": "markdown", |
| 84 | + "metadata": {}, |
| 85 | + "source": [ |
| 86 | + "Please see [Online Evaluation documentation](https://aka.ms/GenAIMonitoringDoc) for configuration of Application Insights. `service_name` is a unique name you provide to define your Generative AI application and identify it within your Application Insights resource. This property will be logged in the `traces` table in Application Insights and can be found in the `customDimensions[\"service.name\"]` field. `evaluation_name` is a unique name you provide for your Online Evaluation schedule. " |
| 87 | + ] |
| 88 | + }, |
| 89 | + { |
| 90 | + "cell_type": "code", |
| 91 | + "execution_count": null, |
| 92 | + "metadata": {}, |
| 93 | + "outputs": [], |
| 94 | + "source": [ |
| 95 | + "# Your Application Insights resource ID\n", |
| 96 | + "# At the moment, it should be something in the format \"/subscriptions/<AzureSubscriptionId>/resourceGroups/<ResourceGroup>/providers/Microsoft.Insights/components/<ApplicationInsights>\"\"\n", |
| 97 | + "app_insights_resource_id = \"<app_insights_resource_id>\"\n", |
| 98 | + "\n", |
| 99 | + "# Name of your generative AI application (will be available in trace data in Application Insights)\n", |
| 100 | + "service_name = \"<service_name>\"\n", |
| 101 | + "\n", |
| 102 | + "# Name of your online evaluation schedule\n", |
| 103 | + "evaluation_name = \"<evaluation_name>\"" |
| 104 | + ] |
| 105 | + }, |
| 106 | + { |
| 107 | + "cell_type": "markdown", |
| 108 | + "metadata": {}, |
| 109 | + "source": [ |
| 110 | + "Below is the Kusto Query Language (KQL) query to query data from Application Insights resource. This query is compatible with data logged by the Azure AI Inferencing Tracing SDK (linked in [documentation](https://aka.ms/GenAIMonitoringDoc)). You can modify it depending on your data schema. The KQL query must output several columns: `operation_ID`, `operation_ParentID`, and `gen_ai_response_id`. You can choose which other columns to output as required by the evaluators you are using." |
| 111 | + ] |
| 112 | + }, |
| 113 | + { |
| 114 | + "cell_type": "code", |
| 115 | + "execution_count": null, |
| 116 | + "metadata": {}, |
| 117 | + "outputs": [], |
| 118 | + "source": [ |
| 119 | + "kusto_query = 'let gen_ai_spans=(dependencies | where isnotnull(customDimensions[\"gen_ai.system\"]) | extend response_id = tostring(customDimensions[\"gen_ai.response.id\"]) | project id, operation_Id, operation_ParentId, timestamp, response_id); let gen_ai_events=(traces | where message in (\"gen_ai.choice\", \"gen_ai.user.message\", \"gen_ai.system.message\") or tostring(customDimensions[\"event.name\"]) in (\"gen_ai.choice\", \"gen_ai.user.message\", \"gen_ai.system.message\") | project id= operation_ParentId, operation_Id, operation_ParentId, user_input = iff(message == \"gen_ai.user.message\" or tostring(customDimensions[\"event.name\"]) == \"gen_ai.user.message\", parse_json(iff(message == \"gen_ai.user.message\", tostring(customDimensions[\"gen_ai.event.content\"]), message)).content, \"\"), system = iff(message == \"gen_ai.system.message\" or tostring(customDimensions[\"event.name\"]) == \"gen_ai.system.message\", parse_json(iff(message == \"gen_ai.system.message\", tostring(customDimensions[\"gen_ai.event.content\"]), message)).content, \"\"), llm_response = iff(message == \"gen_ai.choice\", parse_json(tostring(parse_json(tostring(customDimensions[\"gen_ai.event.content\"])).message)).content, iff(tostring(customDimensions[\"event.name\"]) == \"gen_ai.choice\", parse_json(parse_json(message).message).content, \"\")) | summarize operation_ParentId = any(operation_ParentId), Input = maxif(user_input, user_input != \"\"), System = maxif(system, system != \"\"), Output = maxif(llm_response, llm_response != \"\") by operation_Id, id); gen_ai_spans | join kind=inner (gen_ai_events) on id, operation_Id | project Input, System, Output, operation_Id, operation_ParentId, gen_ai_response_id = response_id'\n", |
| 120 | + "\n", |
| 121 | + "# AzureMSIClientId is the clientID of the User-assigned managed identity created during set-up - see documentation for how to find it\n", |
| 122 | + "properties = {\"AzureMSIClientId\": \"your_client_id\"}" |
| 123 | + ] |
| 124 | + }, |
| 125 | + { |
| 126 | + "cell_type": "code", |
| 127 | + "execution_count": null, |
| 128 | + "metadata": {}, |
| 129 | + "outputs": [], |
| 130 | + "source": [ |
| 131 | + "# Connect to your Application Insights resource\n", |
| 132 | + "app_insights_config = ApplicationInsightsConfiguration(\n", |
| 133 | + " resource_id=app_insights_resource_id, query=kusto_query, service_name=service_name\n", |
| 134 | + ")" |
| 135 | + ] |
| 136 | + }, |
| 137 | + { |
| 138 | + "cell_type": "code", |
| 139 | + "execution_count": null, |
| 140 | + "metadata": {}, |
| 141 | + "outputs": [], |
| 142 | + "source": [ |
| 143 | + "# Connect to your AOAI resource, you must use an AOAI GPT model\n", |
| 144 | + "deployment_name = \"gpt-4\"\n", |
| 145 | + "api_version = \"2024-06-01\"\n", |
| 146 | + "default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)\n", |
| 147 | + "model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)" |
| 148 | + ] |
| 149 | + }, |
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "metadata": {}, |
| 153 | + "source": [ |
| 154 | + "### Configure Evaluators to Run\n", |
| 155 | + "The code below demonstrates how to configure the evaluators you want to run. In this example, we use the `F1ScoreEvaluator`, `RelevanceEvaluator` and the `ViolenceEvaluator`, but all evaluators supported by [Azure AI Evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning) are supported by Online Evaluation and can be configured here. You can either import the classes from the SDK and reference them with the `.id` property, or you can find the fully formed `id` of the evaluator in the AI Studio registry of evaluators, and use it here. " |
| 156 | + ] |
| 157 | + }, |
| 158 | + { |
| 159 | + "cell_type": "code", |
| 160 | + "execution_count": null, |
| 161 | + "metadata": {}, |
| 162 | + "outputs": [], |
| 163 | + "source": [ |
| 164 | + "# id for each evaluator can be found in your AI Studio registry - please see documentation for more information\n", |
| 165 | + "# init_params is the configuration for the model to use to perform the evaluation\n", |
| 166 | + "# data_mapping is used to map the output columns of your query to the names required by the evaluator\n", |
| 167 | + "evaluators = {\n", |
| 168 | + " \"f1_score\": EvaluatorConfiguration(\n", |
| 169 | + " id=F1ScoreEvaluator.id,\n", |
| 170 | + " ),\n", |
| 171 | + " \"relevance\": EvaluatorConfiguration(\n", |
| 172 | + " id=\"azureml://registries/azureml-staging/models/Relevance-Evaluator/versions/4\",\n", |
| 173 | + " init_params={\"model_config\": model_config},\n", |
| 174 | + " data_mapping={\"query\": \"${data.Input}\", \"response\": \"${data.Output}\"},\n", |
| 175 | + " ),\n", |
| 176 | + " \"violence\": EvaluatorConfiguration(\n", |
| 177 | + " id=ViolenceEvaluator.id,\n", |
| 178 | + " init_params={\"azure_ai_project\": project_client.scope},\n", |
| 179 | + " data_mapping={\"query\": \"${data.Input}\", \"response\": \"${data.Output}\"},\n", |
| 180 | + " ),\n", |
| 181 | + "}" |
| 182 | + ] |
| 183 | + }, |
| 184 | + { |
| 185 | + "cell_type": "markdown", |
| 186 | + "metadata": {}, |
| 187 | + "source": [ |
| 188 | + "### Evaluate in the Cloud on a Schedule with Online Evaluation\n", |
| 189 | + "\n", |
| 190 | + "You can configure the `RecurrenceTrigger` based on the class definition [here](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.recurrencetrigger?view=azure-python)." |
| 191 | + ] |
| 192 | + }, |
| 193 | + { |
| 194 | + "cell_type": "code", |
| 195 | + "execution_count": null, |
| 196 | + "metadata": {}, |
| 197 | + "outputs": [], |
| 198 | + "source": [ |
| 199 | + "# Frequency to run the schedule\n", |
| 200 | + "recurrence_trigger = RecurrenceTrigger(frequency=\"day\", interval=1)\n", |
| 201 | + "\n", |
| 202 | + "# Configure the online evaluation schedule\n", |
| 203 | + "evaluation_schedule = EvaluationSchedule(\n", |
| 204 | + " data=app_insights_config,\n", |
| 205 | + " evaluators=evaluators,\n", |
| 206 | + " trigger=recurrence_trigger,\n", |
| 207 | + " description=f\"{service_name} evaluation schedule\",\n", |
| 208 | + " properties=properties,\n", |
| 209 | + ")\n", |
| 210 | + "\n", |
| 211 | + "# Create the online evaluation schedule\n", |
| 212 | + "created_evaluation_schedule = project_client.evaluations.create_or_replace_schedule(service_name, evaluation_schedule)\n", |
| 213 | + "print(\n", |
| 214 | + " f\"Successfully submitted the online evaluation schedule creation request - {created_evaluation_schedule.name}, currently in {created_evaluation_schedule.provisioning_state} state.\"\n", |
| 215 | + ")" |
| 216 | + ] |
| 217 | + }, |
| 218 | + { |
| 219 | + "cell_type": "markdown", |
| 220 | + "metadata": {}, |
| 221 | + "source": [ |
| 222 | + "### Next steps \n", |
| 223 | + "\n", |
| 224 | + "Navigate to the \"Tracing\" tab in [Azure AI Studio](https://ai.azure.com/) to view your logged trace data alongside the evaluations produced by the Online Evaluation schedule. You can use the reference link provided in the \"Tracing\" tab to navigate to a comprehensive workbook in Application Insights for more details on how your application is performing. " |
| 225 | + ] |
| 226 | + } |
| 227 | + ], |
| 228 | + "metadata": { |
| 229 | + "kernelspec": { |
| 230 | + "display_name": "azureai-samples313", |
| 231 | + "language": "python", |
| 232 | + "name": "python3" |
| 233 | + }, |
| 234 | + "language_info": { |
| 235 | + "codemirror_mode": { |
| 236 | + "name": "ipython", |
| 237 | + "version": 3 |
| 238 | + }, |
| 239 | + "file_extension": ".py", |
| 240 | + "mimetype": "text/x-python", |
| 241 | + "name": "python", |
| 242 | + "nbconvert_exporter": "python", |
| 243 | + "pygments_lexer": "ipython3" |
| 244 | + } |
| 245 | + }, |
| 246 | + "nbformat": 4, |
| 247 | + "nbformat_minor": 2 |
| 248 | +} |
0 commit comments