MicrosoftDocs
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/1-introduction.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/1-introduction.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/2-compare-evaluations.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/2-compare-evaluations.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/3-ai-systems.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/3-ai-systems.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/4-standard-metrics.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/4-standard-metrics.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/5-language-model-judge.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/5-language-model-judge.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/6-exercise.yml
Lines changed: 15 additions & 15 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/6-exercise.yml
Lines changed: 15 additions & 15 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/7-knowledge-check.yml
Lines changed: 49 additions & 49 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/7-knowledge-check.yml
Lines changed: 49 additions & 49 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/8-summary.yml
Lines changed: 16 additions & 16 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/8-summary.yml
Lines changed: 16 additions & 16 deletions
diff --git a/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/includes/1-introduction.md
Lines changed: 3 additions & 3 deletions b/‎learn-pr/wwl-data-ai/evaluate-language-models-azure-databricks/includes/1-introduction.md
Lines changed: 3 additions & 3 deletions
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.introduction
-title: Introduction
-metadata:
-  title: Introduction
-  description: "Introduction"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 2
-content: |
-  [!include[](includes/1-introduction.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: "Introduction"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 2
+content: |
+  [!include[](includes/1-introduction.md)]
+
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.compare-evaluations
-title: Compare LLM and traditional ML evaluations
-metadata:
-  title: Compare LLM and traditional ML evaluations
-  description: "Compare Large Language Model and traditional Machine Learning evaluations"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 7
-content: |
-  [!include[](includes/2-compare-evaluations.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.compare-evaluations
+title: Explore LLM evaluation
+metadata:
+  title: Explore LLM evaluation
+  description: "Explore Large Language Model evaluation"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 7
+content: |
+  [!include[](includes/2-compare-evaluations.md)]
+
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.ai-systems
-title: Evaluate LLMs and AI systems
-metadata:
-  title: Evaluate LLMs and AI systems
-  description: "Describe the relationship between LLM evaluation and evaluation of entire AI systems"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 5
-content: |
-  [!include[](includes/3-ai-systems.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.ai-systems
+title: Evaluate LLMs and AI systems
+metadata:
+  title: Evaluate LLMs and AI systems
+  description: "Describe the relationship between LLM evaluation and evaluation of entire AI systems"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 5
+content: |
+  [!include[](includes/3-ai-systems.md)]
+
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.standard-metrics
-title: Evaluate LLMs with standard metrics
-metadata:
-  title: Evaluate LLMs with standard metrics
-  description: "Evaluate LLMs with standard metrics"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 7
-content: |
-  [!include[](includes/4-standard-metrics.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.standard-metrics
+title: Evaluate LLMs with standard metrics
+metadata:
+  title: Evaluate LLMs with standard metrics
+  description: "Evaluate LLMs with standard metrics"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 7
+content: |
+  [!include[](includes/4-standard-metrics.md)]
+
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.language-model-judge
-title: Describe LLM-as-a-judge for evaluation
-metadata:
-  title: Describe LLM-as-a-judge for evaluation
-  description: "Describe LLM-as-a-judge for evaluation"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 7
-content: |
-  [!include[](includes/5-language-model-judge.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.language-model-judge
+title: Describe LLM-as-a-judge for evaluation
+metadata:
+  title: Describe LLM-as-a-judge for evaluation
+  description: "Describe LLM-as-a-judge for evaluation"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 7
+content: |
+  [!include[](includes/5-language-model-judge.md)]
+
@@ -1,15 +1,15 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.exercise
-title: Exercise - Evaluate an Azure OpenAI model
-metadata:
-  title: Exercise - Evaluate an Azure OpenAI model
-  description: "Exercise - Evaluate an Azure OpenAI model"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 30
-content: |
-  [!include[](includes/6-exercise.md)]
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.exercise
+title: Exercise - Evaluate an Azure OpenAI model
+metadata:
+  title: Exercise - Evaluate an Azure OpenAI model
+  description: "Exercise - Evaluate an Azure OpenAI model"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 30
+content: |
+  [!include[](includes/6-exercise.md)]
@@ -1,49 +1,49 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.knowledge-check
-title: Module assessment
-metadata:
-  title: Module assessment
-  description: "Knowledge check"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-  module_assessment: true
-azureSandbox: false
-labModal: false
-durationInMinutes: 3
-quiz:
-  questions:
-  - content: "What is the primary purpose of evaluating a Large Language Model (LLM)?"
-    choices:
-    - content: "To improve its computational efficiency."
-      isCorrect: false
-      explanation: "Incorrect. Evaluating an LLM doesn't improve its computational efficiency."
-    - content: "To assess its accuracy and performance on specific tasks."
-      isCorrect: true
-      explanation: "Correct. The primary purpose of evaluating an LLM is to determine its effectiveness and accuracy."
-    - content: "To increase its training data size."
-      isCorrect: false
-      explanation: "Incorrect. Evaluating an LLM doesn't increase the training data size."
-  - content: "In the context of evaluating language models, what does perplexity measure?"
-    choices:
-    - content: "The size of the training dataset."
-      isCorrect: false
-      explanation: "Incorrect. Perplexity doesn't measure the size of the training dataset."
-    - content: "The diversity of generated text."
-      isCorrect: false
-      explanation: "Incorrect. Perplexity doesn't measure the diversity of generated text."
-    - content: "The uncertainty of the model in predicting the next word."
-      isCorrect: true
-      explanation: "Correct. Perplexity is a measure of how uncertain a language model is when predicting the next word in a sequence. Lower perplexity indicates a better-performing model."
-  - content: "When you evaluate a large language model (LLM) for bias, what is a common approach?"
-    choices:
-    - content: "Measuring the model's training time"
-      isCorrect: false
-      explanation: "Incorrect. Measuring the model's training time doesn't evaluate an LLM for bias."
-    - content: "Analyzing the model's outputs for harmful stereotypes"
-      isCorrect: true
-      explanation: "Correct. Evaluating a model for bias typically involves analyzing its outputs to identify and mitigate harmful stereotypes or biased predictions, ensuring the model is fair and ethical in its responses."
-    - content: "Counting the number of model parameters"
-      isCorrect: false
-      explanation: "Incorrect. Counting the number of model parameters doesn't evaluate an LLM for bias."
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: "Knowledge check"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+  module_assessment: true
+azureSandbox: false
+labModal: false
+durationInMinutes: 3
+quiz:
+  questions:
+  - content: "What is the primary purpose of evaluating a Large Language Model (LLM)?"
+    choices:
+    - content: "To improve its computational efficiency."
+      isCorrect: false
+      explanation: "Incorrect. Evaluating an LLM doesn't improve its computational efficiency."
+    - content: "To assess its accuracy and performance on specific tasks."
+      isCorrect: true
+      explanation: "Correct. The primary purpose of evaluating an LLM is to determine its effectiveness and accuracy."
+    - content: "To increase its training data size."
+      isCorrect: false
+      explanation: "Incorrect. Evaluating an LLM doesn't increase the training data size."
+  - content: "In the context of evaluating language models, what does perplexity measure?"
+    choices:
+    - content: "The size of the training dataset."
+      isCorrect: false
+      explanation: "Incorrect. Perplexity doesn't measure the size of the training dataset."
+    - content: "The diversity of generated text."
+      isCorrect: false
+      explanation: "Incorrect. Perplexity doesn't measure the diversity of generated text."
+    - content: "The uncertainty of the model in predicting the next word."
+      isCorrect: true
+      explanation: "Correct. Perplexity is a measure of how uncertain a language model is when predicting the next word in a sequence. Lower perplexity indicates a better-performing model."
+  - content: "When you evaluate a large language model (LLM) for bias, what is a common approach?"
+    choices:
+    - content: "Measuring the model's training time"
+      isCorrect: false
+      explanation: "Incorrect. Measuring the model's training time doesn't evaluate an LLM for bias."
+    - content: "Analyzing the model's outputs for harmful stereotypes"
+      isCorrect: true
+      explanation: "Correct. Evaluating a model for bias typically involves analyzing its outputs to identify and mitigate harmful stereotypes or biased predictions, ensuring the model is fair and ethical in its responses."
+    - content: "Counting the number of model parameters"
+      isCorrect: false
+      explanation: "Incorrect. Counting the number of model parameters doesn't evaluate an LLM for bias."
@@ -1,16 +1,16 @@
-### YamlMime:ModuleUnit
-uid: learn.wwl.evaluate-language-models-azure-databricks.summary
-title: Summary
-metadata:
-  title: Summary
-  description: "Summary"
-  ms.date: 03/20/2025
-  author: wwlpublish
-  ms.author: theresai
-  ms.topic: unit
-azureSandbox: false
-labModal: false
-durationInMinutes: 1
-content: |
-  [!include[](includes/8-summary.md)]
-
+### YamlMime:ModuleUnit
+uid: learn.wwl.evaluate-language-models-azure-databricks.summary
+title: Summary
+metadata:
+  title: Summary
+  description: "Summary"
+  ms.date: 07/10/2025
+  author: theresa-i
+  ms.author: theresai
+  ms.topic: unit
+azureSandbox: false
+labModal: false
+durationInMinutes: 1
+content: |
+  [!include[](includes/8-summary.md)]
+
@@ -1,5 +1,5 @@
-Evaluating Large Language Models (LLMs) is crucial in artificial intelligence because they're central to many applications, from natural language processing to automated decision-making systems.
+Large Language Models (LLMs) have transformed how we build applications, powering everything from chatbots to content generation systems. As you deploy these models to production, you need to determine if your LLM is working well.
 
-By assessing their performance, interpretability, and ethical implications, you gain insights into their strengths and limitations, enabling more effective deployment in real-world scenarios.
+Evaluation is essential for successfully deploying LLMs to production. You need to understand how well your model performs, whether it produces reliable outputs, and how it behaves across different scenarios.
 
-This evaluation includes traditional metrics like accuracy and efficiency, as well as broader aspects such as fairness, bias, and generalization across diverse tasks, ensuring that LLMs are reliable, transparent, and aligned with human values.
+In this module, you'll learn to evaluate LLMs by comparing evaluation approaches, and understanding how individual model evaluation fits into broader AI system assessment. You'll also learn about standard metrics like accuracy and perplexity, and implementing LLM-as-a-judge techniques for scalable evaluation.