|
| 1 | +--- |
| 2 | +title: How to run an evaluation in Azure DevOps |
| 3 | +titleSuffix: Azure AI Foundry |
| 4 | +description: How to run evaluation in Azure DevOps which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps. |
| 5 | +manager: scottpolly |
| 6 | +ms.service: azure-ai-foundry |
| 7 | +ms.topic: how-to |
| 8 | +ms.date: 05/19/2025 |
| 9 | +ms.reviewer: hanch |
| 10 | +ms.author: lagayhar |
| 11 | +author: lgayhardt |
| 12 | +--- |
| 13 | + |
| 14 | +# How to run an evaluation in Azure DevOps (preview) |
| 15 | + |
| 16 | +[!INCLUDE [feature-preview](../includes/feature-preview.md)] |
| 17 | + |
| 18 | +Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is also provided in Azure DevOps marketplace which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps. The supported feature or evaluators can be found, [GitHub Action](evaluation-github-action.md) |
| 19 | + |
| 20 | +[!INCLUDE [features](../includes/evaluation-github-action-azure-devops-features.md)] |
| 21 | + |
| 22 | +## Prerequisites |
| 23 | + |
| 24 | +[!INCLUDE [hub-only-prereq](../includes/hub-only-prereq.md)] |
| 25 | + |
| 26 | +- Install Azure AI evaluation extension. |
| 27 | + - Go to [Azure DevOps Marketplace](https://marketplace.visualstudio.com/azuredevops). |
| 28 | + - Search for Azure AI evaluation and install the extension into your Azure DevOps organization. |
| 29 | + |
| 30 | +## Set up YAML configuration file |
| 31 | + |
| 32 | +1. Create a new YAML file in your repository. |
| 33 | + You can use the sample YAML provided in the README or clone from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file). |
| 34 | +2. Configure the following inputs: |
| 35 | + - Set up [Azure CLI](/azure/devops/pipelines/tasks/reference/azure-cli-v2) with [service connection](/azure/devops/pipelines/library/service-endpoints?view=azure-devops&preserve-view=true) and Azure Login. |
| 36 | + - Azure AI project connection string |
| 37 | + - Dataset and evaluators |
| 38 | + - Specify the evaluator names you want to use for this evaluation run. |
| 39 | + - Queries (required) and Ground Truth (optional). |
| 40 | + |
| 41 | + See the following sample dataset: |
| 42 | + |
| 43 | + ```JSON |
| 44 | + { |
| 45 | + "name": "MyTestData", |
| 46 | + "evaluators": [ |
| 47 | + "FluencyEvaluator", |
| 48 | + "ViolenceEvaluator" |
| 49 | + ], |
| 50 | + "data": [ |
| 51 | + |
| 52 | + { |
| 53 | + "query": "Tell me about Tokyo?", |
| 54 | + "ground_truth": "Tokyo is the capital of Japan and the largest city in the country. It is located on the eastern coast of Honshu, the largest of Japan's four main islands. Tokyo is the political, economic, and cultural center of Japan and is one of the world's most populous cities. It is also one of the world's most important financial centers and is home to the Tokyo Stock Exchange." |
| 55 | + }, |
| 56 | + { |
| 57 | + "query": "Where is Italy?", |
| 58 | + "ground_truth": "Italy is a country in southern Europe, located on the Italian Peninsula and the two largest islands in the Mediterranean Sea, Sicily and Sardinia. It is a unitary parliamentary republic with its capital in Rome, the largest city in Italy. Other major cities include Milan, Naples, Turin, and Palermo." |
| 59 | + } |
| 60 | + ] |
| 61 | + } |
| 62 | + ``` |
| 63 | + |
| 64 | + - Agent IDs |
| 65 | + Retrieve agent identifiers from the AI Foundry portal. |
| 66 | + |
| 67 | +A sample YAML file: |
| 68 | + |
| 69 | +```yml |
| 70 | + |
| 71 | +trigger: |
| 72 | +- main |
| 73 | +pool: |
| 74 | + |
| 75 | + vmImage: 'windows-latest' |
| 76 | + |
| 77 | +steps: |
| 78 | + |
| 79 | +- task: AzureCLI@2 |
| 80 | + inputs: |
| 81 | + addSpnToEnvironment: true |
| 82 | + azureSubscription: 'az-dev-gh-aprilk-test-connection' |
| 83 | + scriptType: bash |
| 84 | + scriptLocation: inlineScript |
| 85 | + |
| 86 | + inlineScript: | |
| 87 | + echo "##vso[task.setvariable variable=ARM_CLIENT_ID]$servicePrincipalId" |
| 88 | + echo "##vso[task.setvariable variable=ARM_ID_TOEKN]$idToken" |
| 89 | + echo "##vso[task.setvariable variable=ARM_TENANT_ID]$tenantId" |
| 90 | + |
| 91 | +- bash: | |
| 92 | + |
| 93 | + az login --service-principal -u $(ARM_CLIENT_ID) --tenant $(ARM_TENANT_ID) --allow-no-subscriptions --federated-token $(ARM_ID_TOEKN) |
| 94 | + |
| 95 | + displayName: 'Login Azure' |
| 96 | + |
| 97 | +- task: UsePythonVersion@0 |
| 98 | + inputs: |
| 99 | + versionSpec: '3.11' |
| 100 | +- task: AIAgentEvaluation@0 |
| 101 | + inputs: |
| 102 | + azure-aiproject-connection-string: 'azure-ai-project-connection-string-sample' |
| 103 | + deployment-name: "gpt-4o-mini" |
| 104 | + api-version: "2024-08-01-preview" |
| 105 | + data-path: $(Build.SourcesDirectory)\tests\data\golden-dataset-medium.json |
| 106 | +agent-ids: 'agent-id1, agent-id2' |
| 107 | + |
| 108 | +``` |
| 109 | + |
| 110 | +## Set up a new pipeline and trigger an evaluation run |
| 111 | + |
| 112 | +Commit and run the pipeline in Azure DevOps. |
| 113 | + |
| 114 | +## View results |
| 115 | + |
| 116 | +- Select the run and go to "Azure AI Evaluation" tab. |
| 117 | +- The results are shown in the same format as GitHub Action results. |
| 118 | + - The top section summarizes the overview of two AI agent variants. You can select it on the agent ID link, and it directs you to the agent setting page in Azure AI Foundry portal. You can also select the link for Evaluation Results, and it directs you to Azure AI Foundry portal to view individual result in detail. |
| 119 | + - The second section includes evaluation scores and comparison between different variants on statistical significance (for multiple agents) and confidence intervals (for single agent). |
| 120 | + |
| 121 | +Multi agent evaluation result: |
| 122 | +:::image type="content" source="../media/evaluations/azure-devops-multi-agent-result.png" alt-text="Screenshot of multi agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-multi-agent-result.png"::: |
| 123 | + |
| 124 | +Single agent evaluation result: |
| 125 | +:::image type="content" source="../media/evaluations/azure-devops-single-agent-result.png" alt-text="Screenshot of single agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-single-agent-result.png"::: |
| 126 | + |
| 127 | +## Related content |
| 128 | + |
| 129 | +- [How to evaluate generative AI models and applications with Azure AI Foundry](./evaluate-generative-ai-app.md) |
| 130 | +- [How to view evaluation results in Azure AI Foundry portal](./evaluate-results.md) |
0 commit comments