Skip to content

Commit f7643e3

Browse files
authored
Merge pull request #4720 from lgayhardt/evalupdates
Azure DevOps and retire online eval
2 parents a057144 + 53e5ea2 commit f7643e3

10 files changed

+176
-359
lines changed

articles/ai-foundry/.openpublishing.redirection.ai-studio.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1167,6 +1167,11 @@
11671167
"source_path_from_root": "/articles/ai-foundry/how-to/develop/visualize-traces.md",
11681168
"redirect_url": "/azure/ai-foundry/how-to/develop/trace-application#visualize-your-traces",
11691169
"redirect_document_id": false
1170+
},
1171+
{
1172+
"source_path_from_root": "/articles/ai-foundry/how-to/online-evaluation.md",
1173+
"redirect_url": "/azure/ai-foundry/how-to/monitor-applications",
1174+
"redirect_document_id": false
11701175
}
11711176
]
11721177
}

articles/ai-foundry/concepts/ai-red-teaming-agent.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ The AI Red Teaming Agent leverages Microsoft's open-source framework for Python
2727

2828
Together these components (scanning, evaluating, and reporting) help teams understand how AI systems respond to common attacks, ultimately guiding a comprehensive risk management strategy.
2929

30+
[!INCLUDE [uses-hub-only](../includes/uses-hub-only.md )]
31+
3032
## When to use the AI Red Teaming Agent's scans
3133

3234
When thinking about AI-related safety risks developing trustworthy AI systems, Microsoft uses NIST's framework to mitigate risk effectively: Govern, Map, Measure, Manage. We'll focus on the last three parts in relation to the generative AI development lifecycle:

articles/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ This article will guide you through the process of
2525
- Running automated scans.
2626
- Visualizing and tracking your results over time in your Azure AI Foundry project.
2727

28+
[!INCLUDE [uses-hub-only](../../includes/uses-hub-only.md )]
29+
2830
## Getting started
2931

3032
First install the `redteam` package as an extra from Azure AI Evaluation SDK, this provides the PyRIT functionality:
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: How to run an evaluation in Azure DevOps
3+
titleSuffix: Azure AI Foundry
4+
description: How to run evaluation in Azure DevOps which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps.
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.topic: how-to
8+
ms.date: 05/19/2025
9+
ms.reviewer: hanch
10+
ms.author: lagayhar
11+
author: lgayhardt
12+
---
13+
14+
# How to run an evaluation in Azure DevOps (preview)
15+
16+
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
17+
18+
Similar to Azure AI evaluation in GitHub Action, an Azure DevOps extension is also provided in Azure DevOps marketplace which enables offline evaluation of AI models within your CI/CD pipelines in Azure DevOps. The supported feature or evaluators can be found, [GitHub Action](evaluation-github-action.md)
19+
20+
[!INCLUDE [features](../includes/evaluation-github-action-azure-devops-features.md)]
21+
22+
## Prerequisites
23+
24+
[!INCLUDE [hub-only-prereq](../includes/hub-only-prereq.md)]
25+
26+
- Install Azure AI evaluation extension.
27+
- Go to [Azure DevOps Marketplace](https://marketplace.visualstudio.com/azuredevops).
28+
- Search for Azure AI evaluation and install the extension into your Azure DevOps organization.
29+
30+
## Set up YAML configuration file
31+
32+
1. Create a new YAML file in your repository.
33+
You can use the sample YAML provided in the README or clone from the [GitHub repo](https://github.com/microsoft/ai-agent-evals?tab=readme-ov-file).
34+
2. Configure the following inputs:
35+
- Set up [Azure CLI](/azure/devops/pipelines/tasks/reference/azure-cli-v2) with [service connection](/azure/devops/pipelines/library/service-endpoints?view=azure-devops&preserve-view=true) and Azure Login.
36+
- Azure AI project connection string
37+
- Dataset and evaluators
38+
- Specify the evaluator names you want to use for this evaluation run.
39+
- Queries (required) and Ground Truth (optional).
40+
41+
See the following sample dataset:
42+
43+
```JSON
44+
{
45+
  "name": "MyTestData",
46+
  "evaluators": [
47+
    "FluencyEvaluator",
48+
    "ViolenceEvaluator"
49+
  ],
50+
  "data": [
51+
52+
    {
53+
      "query": "Tell me about Tokyo?",
54+
      "ground_truth": "Tokyo is the capital of Japan and the largest city in the country. It is located on the eastern coast of Honshu, the largest of Japan's four main islands. Tokyo is the political, economic, and cultural center of Japan and is one of the world's most populous cities. It is also one of the world's most important financial centers and is home to the Tokyo Stock Exchange."
55+
    },
56+
    {
57+
      "query": "Where is Italy?",
58+
      "ground_truth": "Italy is a country in southern Europe, located on the Italian Peninsula and the two largest islands in the Mediterranean Sea, Sicily and Sardinia. It is a unitary parliamentary republic with its capital in Rome, the largest city in Italy. Other major cities include Milan, Naples, Turin, and Palermo."
59+
    }
60+
  ]
61+
}
62+
```
63+
64+
- Agent IDs
65+
Retrieve agent identifiers from the AI Foundry portal.
66+
67+
A sample YAML file:
68+
69+
```yml
70+
71+
trigger:
72+
- main
73+
pool:
74+
75+
  vmImage: 'windows-latest' 
76+
77+
steps:
78+
79+
- task: AzureCLI@2
80+
  inputs:
81+
    addSpnToEnvironment: true
82+
    azureSubscription: 'az-dev-gh-aprilk-test-connection'
83+
    scriptType: bash
84+
    scriptLocation: inlineScript    
85+
86+
    inlineScript: |
87+
      echo "##vso[task.setvariable variable=ARM_CLIENT_ID]$servicePrincipalId" 
88+
      echo "##vso[task.setvariable variable=ARM_ID_TOEKN]$idToken"
89+
      echo "##vso[task.setvariable variable=ARM_TENANT_ID]$tenantId"
90+
91+
- bash: |
92+
93+
   az login --service-principal -u $(ARM_CLIENT_ID) --tenant $(ARM_TENANT_ID) --allow-no-subscriptions --federated-token $(ARM_ID_TOEKN)
94+
95+
  displayName: 'Login Azure'
96+
97+
- task: UsePythonVersion@0
98+
  inputs:
99+
    versionSpec: '3.11'
100+
- task: AIAgentEvaluation@0
101+
  inputs:
102+
    azure-aiproject-connection-string: 'azure-ai-project-connection-string-sample'
103+
    deployment-name: "gpt-4o-mini"
104+
    api-version: "2024-08-01-preview"
105+
    data-path: $(Build.SourcesDirectory)\tests\data\golden-dataset-medium.json
106+
agent-ids: 'agent-id1, agent-id2'
107+
108+
```
109+
110+
## Set up a new pipeline and trigger an evaluation run
111+
112+
Commit and run the pipeline in Azure DevOps.
113+
114+
## View results
115+
116+
- Select the run and go to "Azure AI Evaluation" tab.
117+
- The results are shown in the same format as GitHub Action results.
118+
- The top section summarizes the overview of two AI agent variants. You can select it on the agent ID link, and it directs you to the agent setting page in Azure AI Foundry portal. You can also select the link for Evaluation Results, and it directs you to Azure AI Foundry portal to view individual result in detail.
119+
- The second section includes evaluation scores and comparison between different variants on statistical significance (for multiple agents) and confidence intervals (for single agent).
120+
121+
Multi agent evaluation result:
122+
:::image type="content" source="../media/evaluations/azure-devops-multi-agent-result.png" alt-text="Screenshot of multi agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-multi-agent-result.png":::
123+
124+
Single agent evaluation result:
125+
:::image type="content" source="../media/evaluations/azure-devops-single-agent-result.png" alt-text="Screenshot of single agent evaluation result in Azure DevOps." lightbox="../media/evaluations/azure-devops-single-agent-result.png":::
126+
127+
## Related content
128+
129+
- [How to evaluate generative AI models and applications with Azure AI Foundry](./evaluate-generative-ai-app.md)
130+
- [How to view evaluation results in Azure AI Foundry portal](./evaluate-results.md)

articles/ai-foundry/how-to/evaluation-github-action.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: How to run evaluation in GitHub Action to streamline the evaluation
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.topic: how-to
8-
ms.date: 05/08/2025
8+
ms.date: 05/19/2025
99
ms.reviewer: hanch
1010
ms.author: lagayhar
1111
author: lgayhardt
@@ -27,6 +27,8 @@ Offline evaluation involves testing AI models and agents using test datasets to
2727

2828
## Prerequisites
2929

30+
[!INCLUDE [hub-only-prereq](../includes/hub-only-prereq.md)]
31+
3032
Two GitHub Actions are available for evaluating AI applications: **ai-agent-evals** and **genai-evals**.
3133

3234
- If your application is already using AI Foundry agents, **ai-agent-evals** is well-suited as it offers a simplified setup process and direct integration with agent-based workflows.

0 commit comments

Comments
 (0)