Skip to content

Commit 8e326a4

Browse files
authored
pull base content,head:MicrosoftDocs:main,into:wwlpublishsync
2 parents abfaafc + e407778 commit 8e326a4

28 files changed

+436
-400
lines changed

learn-pr/paths/create-custom-copilots-ai-studio/index.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ uid: learn.wwl.create-custom-copilots-ai-studio
33
metadata:
44
title: Develop generative AI apps in Azure AI Foundry AI-3016
55
description: Learn how to develop generative AI apps in Azure AI Foundry. (AI-3016)
6-
ms.date: 02/05/2025
6+
ms.date: 04/16/2025
77
author: wwlpublish
88
ms.author: madiepev
99
ms.topic: learning-path
@@ -33,7 +33,7 @@ modules:
3333
- learn.get-started-prompt-flow-ai-studio
3434
- learn.wwl.build-copilot-ai-studio
3535
- learn.wwl.finetune-model-copilot-ai-studio
36-
- learn.wwl.evaluate-models-azure-ai-studio
3736
- learn.wwl.responsible-ai-studio
37+
- learn.wwl.evaluate-models-azure-ai-studio
3838
trophy:
3939
uid: learn.wwl.create-custom-copilots-ai-studio.trophy
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.introduction
3-
title: Introduction
4-
metadata:
5-
title: Introduction
6-
description: "Explore model evaluations for generative AI apps in the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 2
12-
content: |
13-
[!include[](includes/1-introduction.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Explore model evaluations for generative AI apps in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 2
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.assess-models
3-
title: Assess the model performance
4-
metadata:
5-
title: Assess the model performance
6-
description: "Learn how to assess and compare the performance of language models in the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 6
12-
content: |
13-
[!include[](includes/2-assess-models.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.assess-models
3+
title: Assess the model performance
4+
metadata:
5+
title: Assess the model performance
6+
description: "Learn how to assess and compare the performance of language models in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 6
12+
content: |
13+
[!include[](includes/2-assess-models.md)]
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.manual-evaluations
3-
title: Manually evaluate the performance of a model
4-
metadata:
5-
title: Manually evaluate the performance of a model
6-
description: "Learn how manually evaluate the performance of a model in the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 7
12-
content: |
13-
[!include[](includes/3-manual-evaluations.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.manual-evaluations
3+
title: Manually evaluate the performance of a model
4+
metadata:
5+
title: Manually evaluate the performance of a model
6+
description: "Learn how manually evaluate the performance of a model in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 7
12+
content: |
13+
[!include[](includes/3-manual-evaluations.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.automated-evaluations
3+
title: Automated evaluations
4+
metadata:
5+
title: Automated evaluations
6+
description: "Learn how to use automated evaluations in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 4
12+
content: |
13+
[!include[](includes/3b-automated-evaluations.md)]
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.evaluation-flows
3-
title: Assess the performance of your generative AI apps
4-
metadata:
5-
title: Assess the performance of your generative AI apps
6-
description: "Learn how to evaluate your generative AI apps in the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 7
12-
content: |
13-
[!include[](includes/4-evaluation-flows.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.evaluation-flows
3+
title: Assess the performance of your generative AI apps
4+
metadata:
5+
title: Assess the performance of your generative AI apps
6+
description: "Learn how to evaluate your generative AI apps in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 7
12+
content: |
13+
[!include[](includes/4-evaluation-flows.md)]
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.exercise
3-
title: Exercise - Evaluate the performance of your generative AI app
4-
metadata:
5-
title: Exercise - Evaluate the performance of your generative AI app
6-
description: "Evaluate the performance of your generative AI app in the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 15
12-
content: |
13-
[!include[](includes/5-exercise.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.exercise
3+
title: Exercise - Evaluate generative AI model performance
4+
metadata:
5+
title: Exercise - Evaluate generative AI model performance
6+
description: "Evaluate the performance of your generative AI app in the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 15
12+
content: |
13+
[!include[](includes/5-exercise.md)]
Lines changed: 48 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,48 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.knowledge-check
3-
title: Module assessment
4-
metadata:
5-
title: Module assessment
6-
description: "Knowledge check to test your knowledge on evaluating models and applications."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 3
12-
content: |
13-
quiz:
14-
questions:
15-
- content: "You have a specific set of questions you want to ensure your chat application answers correctly. What is the best evaluation to verify that?"
16-
choices:
17-
- content: "Model benchmarks"
18-
isCorrect: false
19-
explanation: "Incorrect."
20-
- content: "Manual evaluations"
21-
isCorrect: true
22-
explanation: Correct. "
23-
- content: "Machine learning metrics"
24-
isCorrect: false
25-
explanation: "Incorrect. "
26-
- content: "Which model benchmark quantifies the semantic similarity between a ground source and the generated response?"
27-
choices:
28-
- content: "GPT Similarity"
29-
isCorrect: true
30-
explanation: "Correct. "
31-
- content: "Coherence"
32-
isCorrect: false
33-
explanation: "Incorrect. "
34-
- content: "Accuracy"
35-
isCorrect: false
36-
explanation: "Incorrect. "
37-
- content: "You want to evaluate how well the generated text adheres to grammatical rules. Which type of evaluation would be best to use?"
38-
choices:
39-
- content: "Manual evaluations"
40-
isCorrect: false
41-
explanation: "Incorrect. "
42-
- content: "Automated evaluations"
43-
isCorrect: true
44-
explanation: "Correct. "
45-
- content: "Risk and safety metrics"
46-
isCorrect: false
47-
explanation: "Incorrect. "
48-
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Knowledge check to test your knowledge on evaluating models and applications."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
quiz:
14+
questions:
15+
- content: "Which evaluation technique can you use to apply your own judgement about the quality of responses to a set of specific prompts?"
16+
choices:
17+
- content: "Model benchmarks"
18+
isCorrect: false
19+
explanation: "Incorrect."
20+
- content: "Manual evaluations"
21+
isCorrect: true
22+
explanation: Correct. "
23+
- content: "Automated evaluations"
24+
isCorrect: false
25+
explanation: "Incorrect. "
26+
- content: "You want to compare generated responses to ground truth based on standard metrics. What kind of metrics should you specify for automated evaluations?"
27+
choices:
28+
- content: "AI quality (AI-assisted)"
29+
isCorrect: false
30+
explanation: "Incorrect. "
31+
- content: "AI quality (NLP)"
32+
isCorrect: true
33+
explanation: "Correct. "
34+
- content: "Risk and safety"
35+
isCorrect: false
36+
explanation: "Incorrect. "
37+
- content: "You want to evaluate the grammatical and linguistic quality of responses. What kind of metrics should you specify for automated evaluations?"
38+
choices:
39+
- content: "AI quality (AI-assisted)"
40+
isCorrect: true
41+
explanation: "Correct. "
42+
- content: "AI quality (NLP)"
43+
isCorrect: false
44+
explanation: "Incorrect. "
45+
- content: "Risk and safety"
46+
isCorrect: false
47+
explanation: "Incorrect. "
48+
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
### YamlMime:ModuleUnit
2-
uid: learn.wwl.evaluate-models-azure-ai-studio.summary
3-
title: Summary
4-
metadata:
5-
title: Summary
6-
description: "Summary of key learning points on evaluating generative AI apps with the Azure AI Foundry portal."
7-
ms.date: 11/28/2024
8-
author: madiepev
9-
ms.author: madiepev
10-
ms.topic: unit
11-
durationInMinutes: 1
12-
content: |
13-
[!include[](includes/7-summary.md)]
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.evaluate-models-azure-ai-studio.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Summary of key learning points on evaluating generative AI apps with the Azure AI Foundry portal."
7+
ms.date: 04/16/2025
8+
author: madiepev
9+
ms.author: madiepev
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/7-summary.md)]

learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/includes/2-assess-models.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ When you develop a generative AI app, you use a language model in your chat appl
66

77
An input (1) is provided to a language model (2), and a response is generated as output (3). The model is then evaluated by analyzing the input, the output, and optionally comparing it to predefined expected output.
88

9-
When you develop a generative AI app, you integrate a language model into a chat flow:
9+
When you develop a generative AI app, you may integrate a language model into a chat flow:
1010

1111
:::image type="content" source="../media/chat-flow-diagram.png" alt-text="Diagram of a chat flow using a language model.":::
1212

@@ -33,13 +33,20 @@ In the Azure AI Foundry portal, you can explore the model benchmarks for all ava
3333

3434
Manual evaluations involve human raters who assess the quality of the model's responses. This approach provides insights into aspects that automated metrics might miss, such as context relevance and user satisfaction. Human evaluators can rate responses based on criteria like relevance, informativeness, and engagement.
3535

36-
## Traditional machine learning metrics
37-
38-
Traditional machine learning metrics are also valuable in evaluating model performance. One such metric is the **F1-score**, which measures the ratio of the number of shared words between the generated and ground truth answers. The F1-score is useful for tasks like text classification and information retrieval, where precision and recall are important.
39-
4036
## AI-assisted metrics
4137

4238
AI-assisted metrics use advanced techniques to evaluate model performance. These metrics can include:
4339

44-
- **Risk and safety metrics**: These metrics assess the potential risks and safety concerns associated with the model's outputs. They help ensure that the model doesn't generate harmful or biased content.
4540
- **Generation quality metrics**: These metrics evaluate the overall quality of the generated text, considering factors like creativity, coherence, and adherence to the desired style or tone.
41+
42+
- **Risk and safety metrics**: These metrics assess the potential risks and safety concerns associated with the model's outputs. They help ensure that the model doesn't generate harmful or biased content.
43+
44+
## Natural language processing metrics
45+
46+
Natural language processing (NLP) metrics are also valuable in evaluating model performance. One such metric is the **F1-score**, which measures the ratio of the number of shared words between the generated and ground truth answers. The F1-score is useful for tasks like text classification and information retrieval, where precision and recall are important. Other common NLP metrics include:
47+
48+
- **BLEU**: Bilingual Evaluation Understudy metric
49+
- **METEOR**: Metric for Evaluation of Translation with Explicit Ordering
50+
- **ROUGE**: Recall-Oriented Understudy for Gisting Evaluation
51+
52+
All of these metrics are used to quantify the level of overlap in the model-generated response and the ground truth (expected response).

0 commit comments

Comments
 (0)