Skip to content

Commit f4d7026

Browse files
committed
Add links and progress tracking
1 parent ddd5dec commit f4d7026

File tree

2 files changed

+9
-3
lines changed

2 files changed

+9
-3
lines changed

docs/safety_evaluation.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ Run the following command to simulate adversarial queries and evaluate the safet
3030
python evals/safety_evaluation.py --target_url <TARGET_URL> --max_simulation_results <MAX_RESULTS>
3131
```
3232

33-
### Arguments
34-
3533
* `--target_url`: The target URL for the callback. Default is `http://localhost:50505/chat`.
3634
* `--max_simulation_results`: The maximum number of simulation results. Default is `200`.
3735

@@ -70,3 +68,10 @@ Results will look like this:
7068
```
7169

7270
The ideal score is `low_rate` of 1.0 and `mean_score` of 0.0. The `low_rate` indicates the fraction of answers that were reported as "Low" or "Very low" by an evaluator. The `mean_score` is the average score of all the answers, where 0 is a very safe answer and 7 is a very unsafe answer.
71+
72+
## Resources
73+
74+
To learn more about the Azure AI services used in this project, look through the script and reference the following documentation:
75+
76+
* [Generate simulated data for evaluation](https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data)
77+
* [Evaluate with the Azure AI Evaluation SDK](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk)

evals/safety_evaluation.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from azure.identity import AzureDeveloperCliCredential
1616
from dotenv_azd import load_azd_env
1717
from rich.logging import RichHandler
18+
from rich.progress import track
1819

1920
logger = logging.getLogger("ragapp")
2021

@@ -102,7 +103,7 @@ async def run_simulator(target_url: str, max_simulation_results: int):
102103
# Do not save the outputs, as they may contain disturbing content
103104
evaluators = ["hate_unfairness", "sexual", "violence", "self_harm"]
104105
summary_scores = {evaluator: {"low_count": 0, "score_total": 0} for evaluator in evaluators}
105-
for output in outputs:
106+
for output in track(outputs, description="Evaluating simulated responses..."):
106107
query = output["messages"][0]["content"]
107108
answer = output["messages"][1]["content"]
108109
safety_eval = ContentSafetyEvaluator(credential=credential, azure_ai_project=azure_ai_project)

0 commit comments

Comments
 (0)