Skip to content

Commit 8cb794b

Browse files
authored
Merge pull request #33986 from ShawnKupfer/WB881
AB#531593 - Edits from module review
2 parents ff432f7 + f02a35a commit 8cb794b

File tree

9 files changed

+208
-183
lines changed

9 files changed

+208
-183
lines changed

learn-pr/azure/machine-learning-architectures-and-hyperparameters/includes/1-introduction.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,18 @@ Not all models are simple mathematical equations that can be plotted as a line.
22

33
## Scenario: Predicting sports results using machine learning
44

5-
Throughout this module, well refer to the following example scenario as we explain concepts surrounding model architecture and hyperparameters. This scenario is designed to appear complex at first but as the exercises progress we'll see how it can be tackled using a little critical thinking and experimentation.
5+
Throughout this module, we'll refer to the following example scenario as we explain concepts surrounding model architecture and hyperparameters. This scenario is designed to appear complex at first, but as the exercises progress we'll learn how you can tackle it using a little critical thinking and experimentation.
66

7-
The Games motto consists of three Latin words: Citius - Altius - Fortius. These words mean Faster - Higher - Stronger. Since this motto was established, the variety of games has grown enormously to include shooting, sailing, and team sports. We would like to explore the role that basic physical features still play in predicting who wins a medal at one of the most prestigious sporting events on the planet. To this end, we'll explore rhythmic gymnastics: a modern addition to the games that combines dance, gymnastics, and calisthenics. One might expect that basic characteristics of age, height, and weight play only a limited role, given the need for agility, flexibility, dexterity, and coordination. Lets use some more advanced machine learning models to see how critical these basic factors really are.
7+
The Games' motto consists of three Latin words: Citius - Altius - Fortius. These words mean Faster - Higher - Stronger. Since this motto was established, the variety of games has grown enormously to include shooting, sailing, and team sports. We'd like to explore the role that basic physical features still play in predicting who wins a medal at one of the most prestigious sporting events on the planet. To this end, we'll explore rhythmic gymnastics: a modern addition to the games that combines dance, gymnastics, and calisthenics. One might expect that basic characteristics of age, height, and weight play only a limited role, given the need for agility, flexibility, dexterity, and coordination. Let's use some more advanced machine learning models to see how critical these basic factors really are.
88

99
## Prerequisites
1010

1111
* Familiarity with machine learning models
1212

1313
## Learning objectives
1414

15-
* Discover new model types– decision trees and random forests.
16-
* Learn how model architecture can affect performance
17-
* Practice working with hyperparameters to improve training effectiveness
15+
In this module, you will:
16+
17+
* Discover new model types: decision trees and random forests.
18+
* Learn how model architecture can affect performance.
19+
* Practice working with hyperparameters to improve training effectiveness.

learn-pr/azure/machine-learning-architectures-and-hyperparameters/includes/2-decision-trees.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
When we talk of architecture, we often think of buildings. Architecture is responsible for how a building is structuredits height, depth, the number of floors, and how things are connected internally. This architecture also dictates how we use a buildingwhere we enter it, and what we can get out of it”, practically speaking.
1+
When we talk about architecture, we often think of buildings. Architecture is responsible for how a building is structured; its height, its depth, the number of floors, and how things are connected internally. This architecture also dictates how we use a building: where we enter it and what we can "get out of it," practically speaking.
22

3-
In machine learning, architecture refers to a similar concept. How many parameters does it have, and how are they linked together to achieve a calculation? Do we calculate a lot in parallel (width) or do we've serial operations that rely on a previous calculation (depth)? How can we provide inputs to this model, and how can we receive outputs? Such architectural decisions only typically apply to more complex models, and architectural decisions can range from simple to complex. These decisions are usually made before the model is trained, though in some circumstances there's room to make changes post-training.
3+
In machine learning, architecture refers to a similar concept. How many parameters does it have, and how are they linked together to achieve a calculation? Do we calculate a lot in parallel (width) or do we have serial operations that rely on a previous calculation (depth)? How can we provide inputs to this model, and how can we receive outputs? Such architectural decisions only typically apply to more complex models, and architectural decisions can range from simple to complex. These decisions are usually made before the model is trained, though in some circumstances there's room to make changes post-training.
44

55
Let’s explore this more concretely with decision trees as an example.
66

77
## What's a decision tree?
88

99
In essence, a decision tree is a flow chart. Decision trees are a categorization model that breaks down decisions into multiple steps.
1010

11-
![Diagram showing a decision tree of gender, age and survival rate.](../media/7-2-a.jpg)
11+
![Diagram showing a decision tree of gender, age, and survival rate.](../media/7-2-a.jpg)
1212

13-
The sample if provided at the entry point (top, in the diagram above) and each exit point has a label (bottom in the diagram). At each node, a simple ‘if’ statement decides which branch the sample passes to next. Once the branch has reached the end of the tree (the leaves), it will be assigned to a label.
13+
The sample if provided at the entry point (top, in the diagram above) and each exit point has a label (bottom in the diagram). At each node, a simple "if" statement decides which branch the sample passes to next. Once the branch has reached the end of the tree (the leaves), it will be assigned to a label.
1414

1515
### How are decision trees trained?
1616

17-
Decision trees are trained one node, or decision point, at a time. At the first node, the entire training-set is assessed. From there a feature is selected that can best separate the set into two subsets that have more homogenous labels. For example, imagine our training set was as follows:
17+
Decision trees are trained one node, or decision point, at a time. At the first node, the entire training-set is assessed. From there, a feature is selected that can best separate the set into two subsets that have more homogenous labels. For example, imagine our training set was as follows:
1818

1919
| Weight (Feature) | Age (Feature) | Won a medal (Label) |
2020
|----------------------|-----------------|------------|
@@ -27,7 +27,7 @@ Decision trees are trained one node, or decision point, at a time. At the first
2727
| 85 | 26 | Yes |
2828
| 90 | 25 | Yes |
2929

30-
If we're to do our best to find a simple rule to split this data, we might split by age, at around 24 years old, because most medal winners were over 24. This split would give us two subsets of data.
30+
If we're doing our best to find a rule to split this data, we might split by age at around 24 years old, because most medal winners were over 24. This split would give us two subsets of data.
3131

3232
**Subset 1**
3333

@@ -47,24 +47,24 @@ If we're to do our best to find a simple rule to split this data, we might split
4747
| 85 | 26 | Yes |
4848
| 90 | 25 | Yes |
4949

50-
If we stop here, we've a simple model with one node and two leaves. Leaf 1 contains non-medal winners, and is 75% accurate on our training set. Leaf 2 contains medal winners, and is also 75% accurate on the training set.
50+
If we stop here, we have a simple model with one node and two leaves. Leaf 1 contains non-medal winners, and is 75% accurate on our training set. Leaf 2 contains medal winners, and is also 75% accurate on the training set.
5151

5252
We don’t need to stop here, though. We can continue this process by splitting the leaves further.
5353

54-
In subset 1, the first new node could split by weight, because the only medal winner had a weight less than people who didn't win a medal. The rule might be set to weight < 65. People with weight < 65 are predicted to have won a medal. While anyone with weight ≥65 don't meet this criterion, and might be predicted to not win a medal.
54+
In subset 1, the first new node could split by weight, because the only medal winner had a weight less than people who didn't win a medal. The rule might be set to "weight < 65". People with weight < 65 are predicted to have won a medal, while anyone with weight ≥65 don't meet this criterion, and might be predicted to not win a medal.
5555

56-
In subset 2, the second new node might also split by weight, but this time predicts that anyone with a weight over 70 would have won a medal, while those under it would not.
56+
In subset 2, the second new node might also split by weight, but this time predicts that anyone with a weight over 70 would have won a medal, while those under it wouldn't.
5757

5858
This would provide us with a tree that could achieve 100% accuracy on the training set.
5959

6060
### Strengths and weaknesses of decision trees
6161

6262
Decision trees are considered to have low bias. This means that they're usually good at identifying features that are important in order to label something correctly.
6363

64-
The major weakness of decision trees is overfitting. Consider the example given above: the model gives an exact way to calculate who is likely to win a medal, and this will predict 100% of the training dataset correctly. This level of accuracy is unusual for machine learning models, which normally make numerous errors on training dataset. Good training performance isn't a bad thing in itself, but the tree has become so specialized to the training set that it probably won't do well on the test set. This is because the tree has managed to learn relationships in the training set that probably aren't realsuch as that having a weight of 60 kg guarantees a medal if you are under 25 years old.
64+
The major weakness of decision trees is overfitting. Consider the example given previously: the model gives an exact way to calculate who is likely to win a medal, and this will predict 100% of the training dataset correctly. This level of accuracy is unusual for machine learning models, which normally make numerous errors on training dataset. Good training performance isn't a bad thing in itself, but the tree has become so specialized to the training set that it probably won't do well on the test set. This is because the tree has managed to learn relationships in the training set that probably aren't real, such as that having a weight of 60 kg guarantees a medal if you're under 25 years old.
6565

6666
## Model architecture affects overfitting
6767

68-
How we structure our decision tree is key to avoiding its weaknesses. The deeper the tree is, the more likely it's to overfit the training set. For example, in the simple tree above, if we limited the tree to only the first node, it would make errors on the training set, but probably do better on the test set. This is because it would have more general rules about who wins medal, such as athletes over 24”, rather than extremely specific rules that might only apply to the training set.
68+
How we structure our decision tree is key to avoiding its weaknesses. The deeper the tree is, the more likely it is to overfit the training set. For example, in the simple tree above, if we limited the tree to only the first node, it would make errors on the training set, but probably do better on the test set. This is because it would have more general rules about who wins medals, such as "athletes over 24," rather than extremely specific rules that might only apply to the training set.
6969

70-
Although we're focused on trees here, other complex models often have similar weakness that can be mitigated through decisions about how they're structured, or how they're allowed to be manipulated by the training.
70+
Although we're focused on trees here, other complex models often have similar weakness that we can mitigate through decisions about how they're structured or how they're allowed to be manipulated by the training.
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
1-
Experimentation with architectures is often a key focus of building effective modern models. We've done so to a basic level with decision trees, but the only limit to this is our imaginationand perhaps our computer’s memory. In fact, thinking more broadly on decision trees resulted in a highly popular model architecture that reduces its decision trees tendency to overfit data.
1+
Experimentation with architectures is often a key focus of building effective modern models. We've done so to a basic level with decision trees, but the only limit to this is our imagination, and perhaps our computer’s memory. In fact, thinking more broadly on decision trees resulted in a highly popular model architecture that reduces its decision trees' tendency to overfit data.
22

33
## What’s a random forest?
44

5-
A random forest is a collection of decision trees, which are used together to estimate which label a sample should be assigned. For example, if we were to train a random forest to predict medal winners, we might train 100 different decision trees. To make a prediction, we would use all trees independently. These would effectively vote for whether the athlete would win a medal, providing a final decision.
5+
A random forest is a collection of decision trees that are used together to estimate which label a sample should be assigned. For example, if we were to train a random forest to predict medal winners, we might train 100 different decision trees. To make a prediction, we would use all trees independently. These would effectively "vote" for whether the athlete would win a medal, providing a final decision.
66

77
### How is a random forest trained?
88

9-
Random forests are built on the idea that while a single decision tree is highly biased, or overfit, if we train several decision trees, they'll be biased in different ways. This requires that each tree is trained independently, and each on a slightly different training set.
9+
Random forests are built on the idea that while a single decision tree is highly biased, or overfit, if we train several decision trees, they'll be biased in different ways. This requires that each tree is trained independently and each on a slightly different training set.
1010

11-
To train a single decision tree a certain number of samples, athletes in our scenario, are extracted from the full training set. Each sample can be selected more than once, and this takes place randomly. The tree is then trained in the standard way. This process is repeated for each tree. As each tree gets a different combination of training examples, each tree ends up trained, and biased, differently to the others.
11+
To train a single decision tree, a certain number of samplesathletes in our scenarioare extracted from the full training set. Each sample can be selected more than once, and this takes place randomly. The tree is then trained in the standard way. This process is repeated for each tree. As each tree gets a different combination of training examples, each tree ends up trained, and biased, differently to the others.
1212

1313
### Advantages of random forest
1414

15-
The performance of random forests is often impressive and so comparisons are often best made against neural networks, which are another popular and high-performance model type. Unlike neural networks, random forest models are easy to train: modern frameworks provide helpful methods that let you do so in only a few lines of code. Random forests are also fast to train and don't need large datasets to perform well. This separates them from neural networks, which can often take minutes or days to train, substantial experience, and often require very large datasets. The architectural decisions for random forests are, while more complex than models such as linear regression, much simpler than neural networks.
15+
The performance of random forests is often impressive and so comparisons are often best made against neural networks, which are another popular and high-performance model type. Unlike neural networks, random-forest models are easy to train: modern frameworks provide helpful methods that let you do so in only a few lines of code. Random forests are also fast to train and don't need large datasets to perform well. This separates them from neural networks, which can often take minutes or days to train, require substantial experience, and often require very large datasets. The architectural decisions for random forests are, while more complex than models such as linear regression, much simpler than neural networks.
1616

1717
### Disadvantages of random forest
1818

19-
The major disadvantage of random forests is that they're difficult to understand. Specifically, while these models are fully transparenteach tree can be inspected and understoodthey often contain so many trees that doing so is virtually impossible.
19+
The major disadvantage of random forests is that they're difficult to understand. Specifically, while these models are fully transparenteach tree can be inspected and understoodthey often contain so many trees that doing so is virtually impossible.
2020

2121
## How can I customize these architectures?
2222

23-
Like several models, random forests have various architectural options. The easiest to consider is the size of the foresthow many trees are involved, along with the size of these trees. For example, it would be possible to request a forest to predict medal winners containing 100 trees, each with a maximum depth of six nodes. This means that the final decision as to whether an athlete will win a medal must be made with no more than six ‘if’ statements.
23+
Like several models, random forests have various architectural options. The easiest to consider is the size of the forest: how many trees are involved, along with the size of these trees. For example, it would be possible to request a forest to predict medal winners containing 100 trees, each with a maximum depth of six nodes. This means that the final decision as to whether an athlete will win a medal must be made with no more than six "if" statements.
2424

25-
As we’ve already seen, increasing the size of a tree (in terms of depth or number of leaves) makes it more likely to overfit the data it's trained on. This limitation also applies to random forests. However, with random forests we can counter this by increasing the number of trees, assuming that each tree will be biased in a different way. We can also restrict each tree to only a certain number of features, or disallowing leaves to be created when it would make only a marginal difference to the training performance. The ability for a random forest to make good predictions isn't infinite. At some point, increasing the size and number of trees gives no further improvement due to the limited variety of training data that we've.
25+
As we’ve already learned, increasing the size of a tree (in terms of depth or number of leaves) makes it more likely to overfit the data on which it's trained. This limitation also applies to random forests. However, with random forests we can counter this by increasing the number of trees, assuming that each tree will be biased in a different way. We can also restrict each tree to only a certain number of features, or by disallowing leaves to be created when it would make only a marginal difference to the training performance. The ability for a random forest to make good predictions isn't infinite. At some point, increasing the size and number of trees gives no further improvement due to the limited variety of training data that we have.

0 commit comments

Comments
 (0)