Skip to content

Commit 59f8537

Browse files
author
Sherry Yang
committed
2 parents 3df7cc8 + b2dce87 commit 59f8537

18 files changed

+239
-72
lines changed

learn-pr/azure/test-machine-learning-models/1-introduction.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Introduction
44
metadata:
55
title: Introduction
66
description: Introduction to the introduction to regression module.
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/2-normalization-and-standardization.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Normalization and standardization
44
metadata:
55
title: Normalization and standardization
66
description: Conceptual unit introducing normalization and standardization in machine learning
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/4-test-training-datasets.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Test and training datasets
44
metadata:
55
title: Test and training datasets
66
description: Conceptual unit about testing and training datasets in machine learning
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/5-exercise-test-training-datasets.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Exercise - Test and train datasets
44
metadata:
55
title: Exercise - Test and train datasets
66
description: Exercise unit testing and training datasets in machine learning
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/6-nuance-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Nuances of test sets
44
metadata:
55
title: Nuances of test sets
66
description: Conceptual unit about nuances of test sets in machine learning
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/7-exercise-test-set-nuances.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
### YamlMime:ModuleUnit
22
uid: learn.machinelearning.test-machine-learning-models.exercise-test-set-nuances
3-
title: Exercise Test set nuances
3+
title: Exercise - Test set nuances
44
metadata:
5-
title: Exercise Test set nuances
5+
title: Exercise - Test set nuances
66
description: Exercise unit about test set nuances in machine learning
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/8-knowledge-check.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Module assessment
44
metadata:
55
title: Module assessment
66
description: Multiple-choice questions
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit
@@ -28,7 +28,7 @@ quiz:
2828
- content: "Underfitting has occurred, and your model isn't accurate enough. You should keep training."
2929
isCorrect: false
3030
explanation: "Incorrect. Continuing to train your model when you already have good performance on your training set won't improve your performance. You need to find ways to improve performance on your test set."
31-
- content: "Overfitting has occurred, and your model isn't performing well on new data outside training. You could stop training earlier, or gather more diverse data."
31+
- content: "Overfitting has occurred, and your model isn't performing well on new data outside training. You could stop training earlier or gather more diverse data."
3232
isCorrect: true
3333
explanation: "Correct. Overfitting has likely occurred, and you can adjust your training to improve performance on your test set. You should consider if you need more diverse training data, or if you're training for too long."
3434
- content: "Your model is fine. You need to use your training data to test your model instead."

learn-pr/azure/test-machine-learning-models/9-summary.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Summary
44
metadata:
55
title: Summary
66
description: An overview of the content covered in the module.
7-
ms.date: 05/25/2021
7+
ms.date: 05/15/2025
88
author: s-polly
99
ms.author: scottpolly
1010
ms.topic: unit

learn-pr/azure/test-machine-learning-models/includes/1-introduction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@ The way we train models is by no means a perfectly automated process. Training's
22

33
## Scenario: Training avalanche rescue dogs
44

5-
Throughout this module, well be using the following example scenario to explain underfitting and overfitting. This scenario is designed to provide an example for how you might meet these concepts while programming for yourself. Keep in mind that these principles generally apply to almost all types of models, not just those we work with here.
5+
Throughout this module, we'll be using the following example scenario to explain underfitting and overfitting. This scenario is designed to provide an example for how you might meet these concepts while programming for yourself. Keep in mind that these principles generally apply to almost all types of models, not just those we work with here.
66

7-
Its time for your charity to train a new generation of dogs in how to find hikers swept up by avalanches. There's debate in the office as to which dogs are best; is a large dog better than a smaller dog? Should the dogs be trained when they're young or when they're more mature? Thankfully, you have statistics on rescues performed over the last few years that you can look to. Training dogs is expensive, though, and you need to be sure that your dog-picking criteria are sound.
7+
It's time for your charity to train a new generation of dogs in how to find hikers swept up by avalanches. There's debate in the office as to which dogs are best; is a large dog better than a smaller dog? Should the dogs be trained when they're young or when they're more mature? Thankfully, you have statistics on rescues performed over the last few years that you can look to. Training dogs is expensive, though, and you need to be sure that your dog-picking criteria are sound.
88

99
## Prerequisites
1010

learn-pr/azure/test-machine-learning-models/includes/2-normalization-and-standardization.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,23 @@ _Feature Scaling_ is a technique that changes the range of values that a feature
22

33
## Normalization versus standardization
44

5-
_Normalization_ means to scale values so that they all fit within a certain range, typically 0–1. For example, if you had a list of peoples ages that were 0, 50, and 100 years, you could normalize by dividing the ages by 100, so that your values were 0, 0.5, and 1.
5+
_Normalization_ means to scale values so that they all fit within a certain range, typically 0–1. For example, if you had a list of people's ages that were 0, 50, and 100 years, you could normalize by dividing the ages by 100 so that your values were 0, 0.5, and 1.
66

7-
_Standardization_ is similar, but instead, we subtract the mean (also known as the average) of the values and divide by the standard deviation. If youre not familiar with standard deviation, not to worry, this means that after standardization, our mean value is zero, and about 95% of values fall between -2 and 2.
7+
_Standardization_ is similar, but instead, we subtract the mean (also known as the average) of the values and divide by the standard deviation. If you're not familiar with standard deviation, not to worry; this means that after standardization, our mean value is zero, and about 95% of values fall between -2 and 2.
88

9-
There are other ways to scale data, but the nuances of these are beyond what we need to know right now. Lets explore why we apply _normalization_ or _standardization_.
9+
There are other ways to scale data, but the nuances of these are beyond what we need to know right now. Let's explore why we apply _normalization_ or _standardization_.
1010

1111
## Why do we need to scale?
1212

13-
There are many reasons we normalize or standardize data before training. You can understand these more easily with an example. Lets say we want to train a model to predict whether a dog will be successful at working in the snow. Our data are shown in the following graph as dots, and the trend line we're trying to find is shown as a solid line:
13+
There are many reasons we normalize or standardize data before training. You can understand these more easily with an example. Let's say we want to train a model to predict whether a dog will be successful at working in the snow. Our data are shown in the following graph as dots, and the trend line we're trying to find is shown as a solid line:
1414

1515
![Diagram showing scaling in a graph of dog height and rescues starting at 50.](../media/2-normalization-graph.png)
1616

1717
### Scaling gives learning a better starting point
1818

19-
The optimal line in the preceding graph has two parameters: the intercept, which is 50, the line at x=0, and slope, which is 0.01; each 1000 millimeters increases rescues by 10. Lets assume we start training with initial estimates of 0 for both of these parameters.
19+
The optimal line in the preceding graph has two parameters: the intercept, which is 50, the line at x=0, and slope, which is 0.01; each 1000 millimeters increases rescues by 10. Let's assume we start training with initial estimates of 0 for both of these parameters.
2020

21-
If our training iterations are altering parameters by around 0.01 per iteration on average, it takes at least 5000 iterations before the intercept is found: 50 / 0.01 = 5000 iterations. Standardization can bring this optimal intercept is closer to zero, which means we can find it much faster. For example, if we subtract the mean from our labelannual rescuesand our featureheightthe intercept is -0.5, not 50, which we can find about 100 times faster.
21+
If our training iterations are altering parameters by around 0.01 per iteration on average, it takes at least 5000 iterations before the intercept is found: 50 / 0.01 = 5000 iterations. Standardization can bring this optimal intercept is closer to zero, which means we can find it much faster. For example, if we subtract the mean from our label (annual rescues) and our feature (height) the intercept is -0.5, not 50, which we can find about 100 times faster.
2222

2323
![Diagram showing scaling in a graph of dog height and rescues starting at 0.](../media/2-normalization-graph-2.png)
2424

@@ -42,6 +42,6 @@ When we work with multiple features, having these on a different scale can cause
4242

4343
## Do I always need to scale?
4444

45-
We dont always need to scale. Some kinds of models, including the preceding models with straight lines, can be fit without an iterative procedure like gradient descent, so they don't mind features being the wrong size. Other models do need scaling to train well, but their libraries often perform feature scaling automatically.
45+
We don't always need to scale. Some kinds of models, including the preceding models with straight lines, can be fit without an iterative procedure like gradient descent so they don't mind features being the wrong size. Other models do need scaling to train well, but their libraries often perform feature scaling automatically.
4646

4747
Generally speaking, the only real downsides to normalization or standardization are that it can make it harder to interpret our models and that we have to write slightly more code. For this reason, feature scaling is a standard part of creating machine learning models.

0 commit comments

Comments
 (0)