Skip to content

Commit c06c3c7

Browse files
authored
Adding sample to evaluate groundedness (#142)
* update promptflow-eval dependencies to azure-ai-evaluation * clear local variables * fix errors and remove 'question' col from data * small fix in evaluator config * add groundedness sample * adding and fixing readme
1 parent faaa35d commit c06c3c7

File tree

3 files changed

+402
-1
lines changed

3 files changed

+402
-1
lines changed

scenarios/evaluate/simulate_adversarial/README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,28 @@ By the end of this tutorial, you should be able to:
2626

2727
### Basic requirements
2828

29-
To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:#### Region support for evaluations| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material || - | - | - | - ||UK South | Will be deprecated 12/1/24| no | no ||East US 2 | yes| yes | yes ||Sweden Central | yes| yes | no|US North Central | yes| no | no ||France Central | yes| no | no ||SwitzerlandWest| yes | no |no|For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.#### Region support for adversarial simulation| Region | Adversarial simulation || - | - ||UK South | yes||East US 2 | yes||Sweden Central | yes||US North Central | yes||France Central | yes|
29+
To use Azure AI Safety Evaluation for different scenarios(simulation, annotation, etc..), you need an **Azure AI Project.** You should provide Azure AI project to run your safety evaluations or simulations with. First[create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources)then [create an Azure AI project]( https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio).You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation servicehosts adversarial models for both simulation and evaluation of harmful content andconnects to it via your Azure AI project.Ensure that your Azure AI project is in one of the supported regions for your desiredevaluation metric:
30+
31+
#### Region support for evaluations
32+
33+
| Region | Hate and unfairness, sexual, violent, self-harm, XPIA | Groundedness | Protected material |
34+
| - | - | - | - |
35+
|UK South | Will be deprecated 12/1/24| no | no |
36+
|East US 2 | yes| yes | yes |
37+
|Sweden Central | yes| yes | no|
38+
|US North Central | yes| no | no |
39+
|France Central | yes| no | no |
40+
|SwitzerlandWest| yes | no |no|
41+
42+
For built-in quality and performance metrics, connect your own deployment of LLMs and therefore youcan evaluate in any region your deployment is in.
43+
44+
#### Region support for adversarial simulation
45+
| Region | Adversarial simulation |
46+
| - | - |
47+
|UK South | yes|
48+
|East US 2 | yes|
49+
|Sweden Central | yes|
50+
|US North Central | yes|
51+
|France Central | yes|
3052

3153
### Estimated Runtime: 20 mins
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
page_type: sample
3+
languages:
4+
- python
5+
products:
6+
- ai-services
7+
- azure-openai
8+
description: Simulator and evaluator for assessing groundedness in custom applications using adversarial questions
9+
---
10+
11+
## Simulator and Evaluator for Groundedness (simulate_evaluate_groundedness.ipynb)
12+
13+
### Overview
14+
15+
This tutorial provides a step-by-step guide on how to use the simulator and evaluator to assess the groundedness of responses in a custom application.
16+
17+
### Objective
18+
19+
The main objective of this tutorial is to help users understand the process of creating and using a simulator and evaluator to test the groundedness of responses in a custom application. By the end of this tutorial, you should be able to:
20+
- Use the simulator to generate adversarial questions
21+
- Run the evaluator to assess the groundedness of the responses
22+
23+
### Programming Languages
24+
- Python
25+
26+
### Basic Requirements
27+
28+
To use Azure AI Safety Evaluation for different scenarios (simulation, annotation, etc.), you need an **Azure AI Project.** You should provide an Azure AI project to run your safety evaluations or simulations with. First, [create an Azure AI hub](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources) then [create an Azure AI project](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects?tabs=ai-studio). You **do not** need to provide your own LLM deployment as the Azure AI Safety Evaluation service hosts adversarial models for both simulation and evaluation of harmful content and connects to it via your Azure AI project. Ensure that your Azure AI project is in one of the supported regions for your desired evaluation metric:
29+
30+
#### Region Support for Evaluations
31+
32+
| Region | Hate and Unfairness, Sexual, Violent, Self-Harm, XPIA | Groundedness | Protected Material |
33+
| - | - | - | - |
34+
| UK South | Will be deprecated 12/1/24 | no | no |
35+
| East US 2 | yes | yes | yes |
36+
| Sweden Central | yes | yes | no |
37+
| US North Central | yes | no | no |
38+
| France Central | yes | no | no |
39+
| Switzerland West | yes | no | no |
40+
41+
For built-in quality and performance metrics, connect your own deployment of LLMs and therefore you can evaluate in any region your deployment is in.
42+
43+
#### Region Support for Adversarial Simulation
44+
45+
| Region | Adversarial Simulation |
46+
| - | - |
47+
| UK South | yes |
48+
| East US 2 | yes |
49+
| Sweden Central | yes |
50+
| US North Central | yes |
51+
| France Central | yes |
52+
53+
### Estimated Runtime: 20 mins

0 commit comments

Comments
 (0)