You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is also provided in Azure DevOps marketplace which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps. The supported feature or evaluators can be found, [GitHub Action](evaluation-github-action.md)
18
+
Similar to the [Azure AI evaluation in GitHub Actions](evaluation-github-action.md), an Azure DevOps extension is also available in the Azure DevOps Marketplace. This extension enables offline evaluation of AI agents within your CI/CD pipelines.
@@ -30,13 +30,15 @@ Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is al
30
30
## Set up YAML configuration file
31
31
32
32
1. Create a new YAML file in your repository.
33
-
You can use the sample YAML provided in the README or clone from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file).
33
+
You can use the sample YAML provided in the README or copy from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file).
34
34
2. Configure the following inputs:
35
35
- Set up [Azure CLI](/azure/devops/pipelines/tasks/reference/azure-cli-v2) with [service connection](/azure/devops/pipelines/library/service-endpoints?view=azure-devops&preserve-view=true) and Azure Login.
36
36
- Azure AI project connection string
37
37
- Dataset and evaluators
38
38
- Specify the evaluator names you want to use for this evaluation run.
39
-
- Queries (required) and Ground Truth (optional).
39
+
- Queries (required).
40
+
- Agent IDs
41
+
Retrieve agent identifiers from the AI Foundry portal.
40
42
41
43
See the following sample dataset:
42
44
@@ -51,18 +53,15 @@ Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is al
51
53
52
54
{
53
55
"query": "Tell me about Tokyo?",
54
-
"ground_truth": "Tokyo is the capital of Japan and the largest city in the country. It is located on the eastern coast of Honshu, the largest of Japan's four main islands. Tokyo is the political, economic, and cultural center of Japan and is one of the world's most populous cities. It is also one of the world's most important financial centers and is home to the Tokyo Stock Exchange."
55
56
},
56
57
{
57
58
"query": "Where is Italy?",
58
-
"ground_truth": "Italy is a country in southern Europe, located on the Italian Peninsula and the two largest islands in the Mediterranean Sea, Sicily and Sardinia. It is a unitary parliamentary republic with its capital in Rome, the largest city in Italy. Other major cities include Milan, Naples, Turin, and Palermo."
59
59
}
60
60
]
61
61
}
62
62
```
63
63
64
-
- Agent IDs
65
-
Retrieve agent identifiers from the AI Foundry portal.
64
+
66
65
67
66
A sample YAML file:
68
67
@@ -113,11 +112,11 @@ Commit and run the pipeline in Azure DevOps.
113
112
## View results
114
113
115
114
- Select the run and go to "Azure AI Evaluation" tab.
116
-
- The results are shown in the same format as GitHub Action results.
115
+
- The results are shown in this format:
117
116
- The top section summarizes the overview of two AI agent variants. You can select it on the agent ID link, and it directs you to the agent setting page in Azure AI Foundry portal. You can also select the link for Evaluation Results, and it directs you to Azure AI Foundry portal to view individual result in detail.
118
117
- The second section includes evaluation scores and comparison between different variants on statistical significance (for multiple agents) and confidence intervals (for single agent).
119
118
120
-
Multi agent evaluation result:
119
+
Evaluation results and comparisons from multiple AI agents:
121
120
:::image type="content" source="../media/evaluations/azure-devops-multi-agent-result.png" alt-text="Screenshot of multi agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-multi-agent-result.png":::
0 commit comments