MicrosoftDocs
diff --git a/‎learn-pr/paths/create-custom-copilots-ai-studio/index.yml
Lines changed: 2 additions & 2 deletions b/‎learn-pr/paths/create-custom-copilots-ai-studio/index.yml
Lines changed: 2 additions & 2 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/1-introduction.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/1-introduction.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/2-assess-models.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/2-assess-models.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/3-manual-evaluations.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/3-manual-evaluations.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/3b-automated-evaluations.yml
Lines changed: 13 additions & 0 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/3b-automated-evaluations.yml
Lines changed: 13 additions & 0 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/4-evaluation-flows.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/4-evaluation-flows.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/5-exercise.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/5-exercise.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/6-knowledge-check.yml
Lines changed: 48 additions & 48 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/6-knowledge-check.yml
Lines changed: 48 additions & 48 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/7-summary.yml
Lines changed: 13 additions & 13 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/7-summary.yml
Lines changed: 13 additions & 13 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/includes/2-assess-models.md
Lines changed: 13 additions & 6 deletions b/‎learn-pr/wwl-data-ai/evaluate-models-azure-ai-studio/includes/2-assess-models.md
Lines changed: 13 additions & 6 deletions
@@ -3,7 +3,7 @@ uid: learn.wwl.create-custom-copilots-ai-studio
 metadata:
   title: Develop generative AI apps in Azure AI Foundry AI-3016
   description: Learn how to develop generative AI apps in Azure AI Foundry. (AI-3016)
-  ms.date: 02/05/2025
+  ms.date: 04/16/2025
   author: wwlpublish
   ms.author: madiepev
   ms.topic: learning-path
@@ -33,7 +33,7 @@ modules:
 - learn.get-started-prompt-flow-ai-studio
 - learn.wwl.build-copilot-ai-studio
 - learn.wwl.finetune-model-copilot-ai-studio
-- learn.wwl.evaluate-models-azure-ai-studio
 - learn.wwl.responsible-ai-studio
+- learn.wwl.evaluate-models-azure-ai-studio
 trophy:
   uid: learn.wwl.create-custom-copilots-ai-studio.trophy
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.introduction
-title: Introduction
-metadata:
-  title: Introduction
-  description: "Explore model evaluations for generative AI apps in the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 2
-content: |
-  [!include[](includes/1-introduction.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: "Explore model evaluations for generative AI apps in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 2
+content: |
+  [!include[](includes/1-introduction.md)]
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.assess-models
-title: Assess the model performance
-metadata:
-  title: Assess the model performance
-  description: "Learn how to assess and compare the performance of language models in the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 6
-content: |
-  [!include[](includes/2-assess-models.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.assess-models
+title: Assess the model performance
+metadata:
+  title: Assess the model performance
+  description: "Learn how to assess and compare the performance of language models in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 6
+content: |
+  [!include[](includes/2-assess-models.md)]
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.manual-evaluations
-title: Manually evaluate the performance of a model
-metadata:
-  title: Manually evaluate the performance of a model
-  description: "Learn how manually evaluate the performance of a model in the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 7
-content: |
-  [!include[](includes/3-manual-evaluations.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.manual-evaluations
+title: Manually evaluate the performance of a model
+metadata:
+  title: Manually evaluate the performance of a model
+  description: "Learn how manually evaluate the performance of a model in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 7
+content: |
+  [!include[](includes/3-manual-evaluations.md)]
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.automated-evaluations
+title: Automated evaluations
+metadata:
+  title: Automated evaluations
+  description: "Learn how to use automated evaluations in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 4
+content: |
+  [!include[](includes/3b-automated-evaluations.md)]
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.evaluation-flows
-title: Assess the performance of your generative AI apps
-metadata:
-  title: Assess the performance of your generative AI apps
-  description: "Learn how to evaluate your generative AI apps in the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 7
-content: |
-  [!include[](includes/4-evaluation-flows.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.evaluation-flows
+title: Assess the performance of your generative AI apps
+metadata:
+  title: Assess the performance of your generative AI apps
+  description: "Learn how to evaluate your generative AI apps in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 7
+content: |
+  [!include[](includes/4-evaluation-flows.md)]
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.exercise
-title: Exercise - Evaluate the performance of your generative AI app
-metadata:
-  title: Exercise - Evaluate the performance of your generative AI app
-  description: "Evaluate the performance of your generative AI app in the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 15
-content: |
-  [!include[](includes/5-exercise.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.exercise
+title: Exercise - Evaluate generative AI model performance
+metadata:
+  title: Exercise - Evaluate generative AI model performance
+  description: "Evaluate the performance of your generative AI app in the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 15
+content: |
+  [!include[](includes/5-exercise.md)]
@@ -1,48 +1,48 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.knowledge-check
-title: Module assessment
-metadata:
-  title: Module assessment
-  description: "Knowledge check to test your knowledge on evaluating models and applications."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 3
-content: |
-quiz:
-  questions:
-  - content: "You have a specific set of questions you want to ensure your chat application answers correctly. What is the best evaluation to verify that?"
-    choices:
-    - content: "Model benchmarks"
-      isCorrect: false
-      explanation: "Incorrect."
-    - content: "Manual evaluations"
-      isCorrect: true
-      explanation: Correct. "
-    - content: "Machine learning metrics"
-      isCorrect: false
-      explanation: "Incorrect. "
-  - content: "Which model benchmark quantifies the semantic similarity between a ground source and the generated response?"
-    choices:
-    - content: "GPT Similarity"
-      isCorrect: true
-      explanation: "Correct. "
-    - content: "Coherence"
-      isCorrect: false
-      explanation: "Incorrect. "
-    - content: "Accuracy"
-      isCorrect: false
-      explanation: "Incorrect. "
-  - content: "You want to evaluate how well the generated text adheres to grammatical rules. Which type of evaluation would be best to use?"
-    choices:
-    - content: "Manual evaluations"
-      isCorrect: false
-      explanation: "Incorrect. "
-    - content: "Automated evaluations"
-      isCorrect: true
-      explanation: "Correct. "
-    - content: "Risk and safety metrics"
-      isCorrect: false
-      explanation: "Incorrect. "
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: "Knowledge check to test your knowledge on evaluating models and applications."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 3
+content: |
+quiz:
+  questions:
+  - content: "Which evaluation technique can you use to apply your own judgement about the quality of responses to a set of specific prompts?"
+    choices:
+    - content: "Model benchmarks"
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "Manual evaluations"
+      isCorrect: true
+      explanation: Correct. "
+    - content: "Automated evaluations"
+      isCorrect: false
+      explanation: "Incorrect. "
+  - content: "You want to compare generated responses to ground truth based on standard metrics. What kind of metrics should you specify for automated evaluations?"
+    choices:
+    - content: "AI quality (AI-assisted)"
+      isCorrect: false
+      explanation: "Incorrect. "
+    - content: "AI quality (NLP)"
+      isCorrect: true
+      explanation: "Correct. "
+    - content: "Risk and safety"
+      isCorrect: false
+      explanation: "Incorrect. "
+  - content: "You want to evaluate the grammatical and linguistic quality of responses. What kind of metrics should you specify for automated evaluations?"
+    choices:
+    - content: "AI quality (AI-assisted)"
+      isCorrect: true
+      explanation: "Correct. "
+    - content: "AI quality (NLP)"
+      isCorrect: false
+      explanation: "Incorrect. "
+    - content: "Risk and safety"
+      isCorrect: false
+      explanation: "Incorrect. "
+
@@ -1,13 +1,13 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-models-azure-ai-studio.summary
-title: Summary
-metadata:
-  title: Summary
-  description: "Summary of key learning points on evaluating generative AI apps with the Azure AI Foundry portal."
-  ms.date: 11/28/2024
-  author: madiepev
-  ms.author: madiepev
-  ms.topic: unit
-durationInMinutes: 1
-content: |
-  [!include[](includes/7-summary.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-models-azure-ai-studio.summary
+title: Summary
+metadata:
+  title: Summary
+  description: "Summary of key learning points on evaluating generative AI apps with the Azure AI Foundry portal."
+  ms.date: 04/16/2025
+  author: madiepev
+  ms.author: madiepev
+  ms.topic: unit
+durationInMinutes: 1
+content: |
+  [!include[](includes/7-summary.md)]
@@ -6,7 +6,7 @@ When you develop a generative AI app, you use a language model in your chat appl
 
 An input (1) is provided to a language model (2), and a response is generated as output (3). The model is then evaluated by analyzing the input, the output, and optionally comparing it to predefined expected output.
 
-When you develop a generative AI app, you integrate a language model into a chat flow:
+When you develop a generative AI app, you may integrate a language model into a chat flow:
 
 :::image type="content" source="../media/chat-flow-diagram.png" alt-text="Diagram of a chat flow using a language model.":::
 
@@ -33,13 +33,20 @@ In the Azure AI Foundry portal, you can explore the model benchmarks for all ava
 
 Manual evaluations involve human raters who assess the quality of the model's responses. This approach provides insights into aspects that automated metrics might miss, such as context relevance and user satisfaction. Human evaluators can rate responses based on criteria like relevance, informativeness, and engagement.
 
-## Traditional machine learning metrics
-
-Traditional machine learning metrics are also valuable in evaluating model performance. One such metric is the **F1-score**, which measures the ratio of the number of shared words between the generated and ground truth answers. The F1-score is useful for tasks like text classification and information retrieval, where precision and recall are important.
-
 ## AI-assisted metrics
 
 AI-assisted metrics use advanced techniques to evaluate model performance. These metrics can include:
 
-- **Risk and safety metrics**: These metrics assess the potential risks and safety concerns associated with the model's outputs. They help ensure that the model doesn't generate harmful or biased content.
 - **Generation quality metrics**: These metrics evaluate the overall quality of the generated text, considering factors like creativity, coherence, and adherence to the desired style or tone.
+
+- **Risk and safety metrics**: These metrics assess the potential risks and safety concerns associated with the model's outputs. They help ensure that the model doesn't generate harmful or biased content.
+
+## Natural language processing metrics
+
+Natural language processing (NLP) metrics are also valuable in evaluating model performance. One such metric is the **F1-score**, which measures the ratio of the number of shared words between the generated and ground truth answers. The F1-score is useful for tasks like text classification and information retrieval, where precision and recall are important. Other common NLP metrics include:
+
+- **BLEU**: Bilingual Evaluation Understudy metric
+- **METEOR**: Metric for Evaluation of Translation with Explicit Ordering
+- **ROUGE**: Recall-Oriented Understudy for Gisting Evaluation
+
+All of these metrics are used to quantify the level of overlap in the model-generated response and the ground truth (expected response).