Skip to content

Commit cb254bf

Browse files
Merge pull request #33987 from ShawnKupfer/WB882
AB#531594 - Edits from module review
2 parents 8cb794b + 323a87c commit cb254bf

File tree

7 files changed

+169
-145
lines changed

7 files changed

+169
-145
lines changed
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
We can assess our classification models in terms of the kinds of mistakes that they make, such as false negatives and false positives. This can give insight into the kinds of mistakes a model makes but doesn't necessarily give deep information on how the model could perform if slight adjustments were made to its decision criteria. Here we'll discuss receiver operator characteristic (ROC) curves, which build on the idea of a confusion matrix but provide us with deeper information that lets us improve our models to a greater degree.
1+
We can assess our classification models in terms of the kinds of mistakes that they make, such as false negatives and false positives. This can give insight into the kinds of mistakes a model makes, but doesn't necessarily provide deep information on how the model could perform if slight adjustments were made to its decision criteria. Here, we'll discuss receiver operator characteristic (ROC) curves, which build on the idea of a confusion matrix but provide us with deeper information that lets us improve our models to a greater degree.
22

33
## Scenario:
44

55
Throughout this module, we’ll be using the following example scenario to explain and practice working with ROC curves.
66

7-
Your avalanche-rescue charity has successfully built a machine learning model that can estimate whether an object detected by lightweight sensors is a hiker or a natural object, such as a tree or rock. This lets you keep track of how many people are on the mountain, so you know whether a rescue team is needed when an avalanche strikes. The model does reasonably well, though you wonder if there's room for improvement. Internally, the model must make a binary decision as to whether an object is a hiker or not, but this is based on probabilities. Can this decision-making process be tweaked to improve its performance?
7+
Your avalanche-rescue charity has successfully built a machine learning model that can estimate whether an object detected by lightweight sensors is a hiker or a natural object, such as a tree or a rock. This lets you keep track of how many people are on the mountain, so you know whether a rescue team is needed when an avalanche strikes. The model does reasonably well, though you wonder if there's room for improvement. Internally, the model must make a binary decision as to whether an object is a hiker or not, but this is based on probabilities. Can this decision-making process be tweaked to improve its performance?
88

99
## Prerequisites
1010

@@ -14,6 +14,6 @@ Your avalanche-rescue charity has successfully built a machine learning model th
1414

1515
In this module, you will:
1616

17-
* Understand how to create ROC curves
18-
* Explore how to assess and compare models using these curves
19-
* Practice fine-tuning a model using characteristics plotted on ROC curves
17+
* Understand how to create ROC curves.
18+
* Explore how to assess and compare models using these curves.
19+
* Practice fine-tuning a model using characteristics plotted on ROC curves.
Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,43 @@
11
Classification models must assign a sample to a category. For example, it must use features such as size, color, and motion to determine whether an object is a hiker or a tree.
22

3-
We can improve classification models many ways. For example, we can ensure our data are balanced, clean, and scaled. We can also alter our model architecture, and use hyperparameters to squeeze as much performance as we possibly can out of our data and architecture. Eventually, we find no better way to improve performance on our test (or hold-out) set and declare our model ready.
3+
We can improve classification models in many ways. For example, we can ensure our data are balanced, clean, and scaled. We can also alter our model architecture and use hyperparameters to squeeze as much performance as we possibly can out of our data and architecture. Eventually, we find no better way to improve performance on our test (or hold-out) set and declare our model ready.
44

5-
Model tuning to this point can be complex, but a final simple step can be used to further improve how well our model works. To understand this, though, we need to go back to basics.
5+
Model tuning to this point can be complex, but we can use a final simple step to further improve how well our model works. To understand this, though, we need to go back to basics.
66

77
## Probabilities and categories
88

9-
Many models have multiple decision-making stages, and the final one often is simply a binarization step. During binarization, probabilities are converted into a hard label. For example, lets say that the model is provided with features and calculates that there's a 75% chance that it was shown a hiker, and 25% chance it was shown a tree. An object cannot be 75% hiker and 25% tree it's one or the other! As such, the model applies a threshold, which is normally 50%. As the hiker class is larger than 50%, the object is declared to be a hiker.
9+
Many models have multiple decision-making stages, and the final one often is simply a binarization step. During binarization, probabilities are converted into a hard label. For example, let's say that the model is provided with features and calculates that there's a 75% chance that it was shown a hiker, and 25% chance it was shown a tree. An object can't be 75% hiker and 25% tree; it's one or the other! As such, the model applies a threshold, which is normally 50%. As the hiker class is larger than 50%, the object is declared to be a hiker.
1010

11-
The 50% threshold is logicalit means that the most likely label according to the model is always chosen. If the model is biased, however, this 50% threshold might not be appropriate. For example, if the model has a slight tendency to pick trees more than hikerspicking trees 10% more frequently than it should we could adjust our decision threshold to account for this.
11+
The 50% threshold is logical; it means that the most likely label according to the model is always chosen. If the model is biased, however, this 50% threshold might not be appropriate. For example, if the model has a slight tendency to pick trees more than hikers, picking trees 10% more frequently than it should, we could adjust our decision threshold to account for this.
1212

1313
## Refresher on decision matrices
1414

1515
Decision matrices are a great way to assess the kinds of mistakes a model is making. This gives us the rates of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN)
1616

17-
![Screenshot showing a confusion matrix of true positives, true negatives, false positives, and false negatives.](../media/2-decision-matrices.png)
17+
![Diagram showing a confusion matrix of true positives, true negatives, false positives, and false negatives.](../media/2-decision-matrices.png)
1818

1919
We can calculate some handy characteristics from the confusion matrix. Two popular characteristics are:
2020

21-
* True Positive Rate (sensitivity): how often True labels are correctly identified as True’. For example, how often the model predicts hiker when the sample it's shown is in fact a hiker.
22-
* False Positive Rate (false alarm rate): how often False labels are incorrectly identified as True’. For example, how often the model predicts ‘Hiker’ when it's shown a tree.
21+
* **True Positive Rate (sensitivity)**: how often "True" labels are correctly identified as "True." For example, how often the model predicts "hiker" when the sample it's shown is in fact a hiker.
22+
* **False Positive Rate (false alarm rate)**: how often "False" labels are incorrectly identified as "True." For example, how often the model predicts "hiker" when it's shown a tree.
2323

24-
Looking at true positive and false positive rates can help us understand a models performance.
24+
Looking at true positive and false positive rates can help us understand a model's performance.
2525

26-
Consider our hiker example. Ideally, the true positive rate is very high, and the false positive rate is very low, because this means that the model identifies hikers well, and doesnt identify trees as hikers very often. Yet, if the true positive rate is very high, but the false positive rate is also very high, then the model is biased: it's identifying almost everything it encounters as hiker. Similarly, we dont want a model with a low true positive rate, because then when the model encounters a hiker, it'll label them as a tree.
26+
Consider our hiker example. Ideally, the true positive rate is very high, and the false positive rate is very low, because this means that the model identifies hikers well and doesn't identify trees as hikers very often. Yet, if the true positive rate is very high, but the false positive rate is also very high, then the model is biased; it's identifying almost everything it encounters as hiker. Similarly, we don't want a model with a low true positive rate, because then when the model encounters a hiker, it'll label them as a tree.
2727

2828
## ROC curves
2929

3030
Receiver operator characteristic (ROC) curves are a graph where we plot true positive rate versus false positive rate.
3131

32-
ROC curves can be confusing for beginners for two main reasons. The first reason is that, beginners know that a model only has one value for true positive and true negative rates. So an ROC plot must look like this:
32+
ROC curves can be confusing for beginners for two main reasons. The first reason is that beginners know that a model only has one value for true positive and true negative rates, so an ROC plot must look like this:
3333

3434
![Receiver operator characteristic curve graph with one plot point.](../media/roc-graph.png)
3535

36-
If youre also thinking this, youre right. A trained model only produces one point. However, remember that our models have a threshold—normally 50%—that is used to decide whether the true (hiker) or false (tree) label should be used. If we change this threshold to 30% and recalculate true positive and false positive rates, we get another point:
36+
If you're also thinking this, you're right. A trained model only produces one point. However, remember that our models have a threshold—normally 50%—that's used to decide whether the true (hiker) or false (tree) label should be used. If we change this threshold to 30% and recalculate true positive and false positive rates, we get another point:
3737

3838
![Receiver operator characteristic curve graph with two plot points.](../media/roc-graph-2.png)
3939

40-
If we do this for thresholds between 0% - 100%, we might get a graph like this:
40+
If we do this for thresholds between 0%-100%, we might get a graph like this:
4141

4242
![Receiver operator characteristic curve graph with a line of plot points.](../media/roc-graph-3.png)
4343

@@ -51,4 +51,4 @@ The second reason these graphs can be confusing is the jargon involved. Remember
5151

5252
## Good ROC, bad ROC
5353

54-
Understanding good and bad ROC curves is something best done in an interactive environment. When youre ready, jump into the next exercise to explore this topic.
54+
Understanding good and bad ROC curves is something best done in an interactive environment. When you're ready, jump into the next exercise to explore this topic.

learn-pr/azure/optimize-model-performance-roc-auc/includes/4-compare-optimize-curves.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
1-
Receiver operator characteristic (ROC) curves let us compare models to one another and tune our selected model. Lets discuss how and why these are done.
1+
Receiver operator characteristic (ROC) curves let us compare models to one another and tune our selected model. Let's discuss how and why these are done.
22

33
## Tuning a model
44

5-
The most obvious use for an ROC curve is to choose a decision threshold that gives the best performance. Recall that our models provide us with probabilities, such as a 65% chance that the sample is a hiker. The decision threshold is the point above which a sample is assigned true (hiker) or below which it's assigned `false` (tree). If our decision threshold was 50%, then 65% would be assigned to true (hiker). If our decision threshold was 70%, however, a probability of 65% would be too small, and be assigned to false (‘tree).
5+
The most obvious use for an ROC curve is to choose a decision threshold that gives the best performance. Recall that our models provide us with probabilities, such as a 65% chance that the sample is a hiker. The decision threshold is the point above which a sample is assigned true (hiker) or below which it's assigned `false` (tree). If our decision threshold was 50%, then 65% would be assigned to "true" (hiker). If our decision threshold was 70%, however, a probability of 65% would be too small, and be assigned to "false" (tree).
66

7-
Weve seen in the previous exercise that when we construct an ROC curve, we're just changing the decision threshold and assessing how well the model works. When we do this, we can find the threshold that gives the optimal results.
7+
We've seen in the previous exercise that when we construct an ROC curve, we're just changing the decision threshold and assessing how well the model works. When we do this, we can find the threshold that gives the optimal results.
88

9-
Usually there isn't a single threshold that gives both the best true positive rate (TPR) and the lower false positive rate (FPR). This means that the optimal threshold depends on what you are trying to achieve. For example, in our scenario, its very important to have a high true positive rate because if a hiker isn't identified and an avalanche occurs the team won't know to rescue them. There's a trade-off, thoughif the false positive rate is too high, then the rescue team may repeatedly be sent out to rescue people who simply don't exist. In other situations, the false positive rate is considered more important. For example, science has a low tolerance for false-positive results – if the false-positive rate of scientific experiments was higher, there would be an endless flurry of contradictory claims and it would be impossible to make sense of what is real.
9+
Usually there isn't a single threshold that gives both the best true positive rate (TPR) and the lower false positive rate (FPR). This means that the optimal threshold depends on what you're trying to achieve. For example, in our scenario, it's very important to have a high true positive rate, because if a hiker isn't identified and an avalanche occurs, the team won't know to rescue them. There's a trade-off, though: if the false positive rate is too high, then the rescue team may repeatedly be sent out to rescue people who simply don't exist. In other situations, the false positive rate is considered more important. For example, science has a low tolerance for false-positive results. If the false-positive rate of scientific experiments was higher, there would be an endless flurry of contradictory claims, and it would be impossible to make sense of what's real.
1010

1111
## Comparing models with AUC
1212

13-
ROC curves can be used to compare models to each other, just like cost functions can. ROC curve for a model shows how well it will work for a variety of decision thresholds. At the end of the day, what is most important in a model is how it will perform in the real worldwhere there's only one decision threshold. Why then, would we want to compare models using thresholds we'll never use? There are two answers for this.
13+
You can use ROC curves to compare models to each other, just like you can with cost functions. An ROC curve for a model shows how well it will work for a variety of decision thresholds. At the end of the day, what's most important in a model is how it will perform in the real world, where there's only one decision threshold. Why then would we want to compare models using thresholds we'll never use? There are two answers for this.
1414

1515
Firstly, comparing ROC curves in particular ways is like performing a statistical test that tells us not just that one model did better on this particular test set, but whether it's likely to continue to perform better in the future. This is out of the scope of this learning material, but it's worth keeping in mind.
1616

17-
Secondly, the ROC curve shows, to some degree, how reliant the model is on having the perfect threshold. For example, if our model only works well when we have a decision threshold of 0.9, but terribly above or below this value, it's not a good design. We would probably prefer to work with a model that works reasonably well for various thresholds, knowing that if the real-world data we come across is slightly different to our test set, our models performance won't necessarily collapse.
17+
Secondly, the ROC curve shows, to some degree, how reliant the model is on having the perfect threshold. For example, if our model only works well when we have a decision threshold of 0.9, but terribly above or below this value, it's not a good design. We'd probably prefer to work with a model that works reasonably well for various thresholds, knowing that if the real-world data we come across is slightly different to our test set, our model's performance won't necessarily collapse.
1818

1919
### How to compare ROCs?
2020

2121
The easiest way to compare ROCs numerically is using the area under the curve (AUC). Literally, this is the area of the graph that is below the curve. For example, our perfect model from the last exercise has an AUC of 1:
2222

2323
![Diagram showing a receiver operator characteristic curve graph using area under the curve.](../media/roc-auc-graph.png)
2424

25-
While our model that did not better than chance has an area of about 0.5:
25+
While our model that did no better than chance has an area of about 0.5:
2626

2727
![Diagram showing a receiver operator characteristic curve graph with area under the curve at a sharp angle.](../media/roc-auc-graph-2.png)
2828

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Weve covered receiver operator characteristic (ROC) curves in some depth. We learned they graph how often we mistakenly assign a true label against how often we correctly assign a true label. Each point on the graph represents one threshold that was applied.
1+
We've covered receiver operator characteristic (ROC) curves in some depth. We learned they graph how often we mistakenly assign a true label against how often we correctly assign a true label. Each point on the graph represents one threshold that was applied.
22

3-
We learned how we can use ROC curves to tune our decision threshold in the final model. We also saw how area-under the curve (AUC) can give us an idea as to how reliant our model is to having the perfect decision threshold. It is also a handy measure to compare two models to one another.
4-
Congratulations on getting so far! As always, now you have a new technique under your belt the best you can do for your learning is practice using it on data you care about. Through this, you will gain experience and understand nuances that we haven't had time or space to cover here. Good luck!
3+
We learned how we can use ROC curves to tune our decision threshold in the final model. We also saw how area-under the curve (AUC) can give us an idea as to how reliant our model is to having the perfect decision threshold. It's also a handy measure to compare two models to one another.
4+
Congratulations on getting so far! As always, now that you have a new technique under your belt, the best you can do for your learning is practice using it on data you care about. By doing so, you'll gain experience and understand nuances that we haven't had time or space to cover here. Good luck!

0 commit comments

Comments
 (0)