Skip to content

Commit b30937d

Browse files
Merge pull request #4857 from lgayhardt/azuredevops
Eval Azure DevOps Updates
2 parents e7a5c08 + e727765 commit b30937d

File tree

1 file changed

+8
-9
lines changed

1 file changed

+8
-9
lines changed

articles/ai-foundry/how-to/evaluation-azure-devops.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ author: lgayhardt
1515

1616
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
1717

18-
Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is also provided in Azure DevOps marketplace which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps. The supported feature or evaluators can be found, [GitHub Action](evaluation-github-action.md)
18+
Similar to the [Azure AI evaluation in GitHub Actions](evaluation-github-action.md), an Azure DevOps extension is also available in the Azure DevOps Marketplace. This extension enables offline evaluation of AI agents within your CI/CD pipelines.
1919

2020
[!INCLUDE [features](../includes/evaluation-github-action-azure-devops-features.md)]
2121

@@ -30,13 +30,15 @@ Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is al
3030
## Set up YAML configuration file
3131

3232
1. Create a new YAML file in your repository.
33-
You can use the sample YAML provided in the README or clone from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file).
33+
You can use the sample YAML provided in the README or copy from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file).
3434
2. Configure the following inputs:
3535
- Set up [Azure CLI](/azure/devops/pipelines/tasks/reference/azure-cli-v2) with [service connection](/azure/devops/pipelines/library/service-endpoints?view=azure-devops&preserve-view=true) and Azure Login.
3636
- Azure AI project connection string
3737
- Dataset and evaluators
3838
- Specify the evaluator names you want to use for this evaluation run.
39-
- Queries (required) and Ground Truth (optional).
39+
- Queries (required).
40+
- Agent IDs
41+
Retrieve agent identifiers from the AI Foundry portal.
4042

4143
See the following sample dataset:
4244

@@ -51,18 +53,15 @@ Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is al
5153

5254
    {
5355
      "query": "Tell me about Tokyo?",
54-
      "ground_truth": "Tokyo is the capital of Japan and the largest city in the country. It is located on the eastern coast of Honshu, the largest of Japan's four main islands. Tokyo is the political, economic, and cultural center of Japan and is one of the world's most populous cities. It is also one of the world's most important financial centers and is home to the Tokyo Stock Exchange."
5556
    },
5657
    {
5758
      "query": "Where is Italy?",
58-
      "ground_truth": "Italy is a country in southern Europe, located on the Italian Peninsula and the two largest islands in the Mediterranean Sea, Sicily and Sardinia. It is a unitary parliamentary republic with its capital in Rome, the largest city in Italy. Other major cities include Milan, Naples, Turin, and Palermo."
5959
    }
6060
  ]
6161
}
6262
```
6363

64-
- Agent IDs
65-
Retrieve agent identifiers from the AI Foundry portal.
64+
6665

6766
A sample YAML file:
6867

@@ -113,11 +112,11 @@ Commit and run the pipeline in Azure DevOps.
113112
## View results
114113

115114
- Select the run and go to "Azure AI Evaluation" tab.
116-
- The results are shown in the same format as GitHub Action results.
115+
- The results are shown in this format:
117116
- The top section summarizes the overview of two AI agent variants. You can select it on the agent ID link, and it directs you to the agent setting page in Azure AI Foundry portal. You can also select the link for Evaluation Results, and it directs you to Azure AI Foundry portal to view individual result in detail.
118117
- The second section includes evaluation scores and comparison between different variants on statistical significance (for multiple agents) and confidence intervals (for single agent).
119118

120-
Multi agent evaluation result:
119+
Evaluation results and comparisons from multiple AI agents:
121120
:::image type="content" source="../media/evaluations/azure-devops-multi-agent-result.png" alt-text="Screenshot of multi agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-multi-agent-result.png":::
122121

123122
Single agent evaluation result:

0 commit comments

Comments
 (0)