You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/tutorial-designer-automobile-price-train-score.md
+51-51Lines changed: 51 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: 'Tutorial: Predict automobile price with the designer'
3
3
titleSuffix: Azure Machine Learning
4
-
description: Learn how to train, score, and deploy a machine learning model using a draganddrop interface. This tutorial is part one of a two-part series on predicting automobile prices using linear regression.
4
+
description: Learn how to train, score, and deploy a machine learning model by using a drag-and-drop interface. This tutorial is part one of a two-part series on predicting automobile prices by using linear regression.
5
5
6
6
author: peterclu
7
7
ms.author: peterlu
@@ -17,63 +17,63 @@ ms.date: 11/04/2019
17
17
18
18
In this two-part tutorial, you learn how to use the Azure Machine Learning designer to develop and deploy a predictive analytics solution that predicts the price of any car.
19
19
20
-
In part one, you set up your environment, drag-and-drop modules onto an interactive canvas, and connect them together to create an Azure Machine Learning pipeline.
20
+
In part one, you set up your environment, drag modules onto an interactive canvas, and connect them together to create an Azure Machine Learning pipeline.
21
21
22
-
In part one of the tutorial you learn how to:
22
+
In part one of the tutorial, you'll learn how to:
23
23
24
24
> [!div class="checklist"]
25
-
> * Create a new pipeline
26
-
> * Import data
27
-
> * Prepare data
28
-
> * Train a machine learning model
29
-
> * Evaluate a machine learning model
25
+
> * Create a new pipeline.
26
+
> * Import data.
27
+
> * Prepare data.
28
+
> * Train a machine learning model.
29
+
> * Evaluate a machine learning model.
30
30
31
-
In [part two](tutorial-designer-automobile-price-deploy.md) of the tutorial, you learn how to deploy your predictive model as a real-time inferencing endpoint to predict the price of any car based on technical specifications you send it.
31
+
In [part two](tutorial-designer-automobile-price-deploy.md) of the tutorial, you'll learn how to deploy your predictive model as a real-time inferencing endpoint to predict the price of any car based on technical specifications you send it.
32
32
33
-
> [!Note]
33
+
> [!NOTE]
34
34
>A completed version of this tutorial is available as a sample pipeline.
35
35
>
36
-
>To find it, go to the **designer in your workspace**. In the **New pipeline** section, select **Sample 1 - Regression: Automobile Price Prediction(Basic)**.
36
+
>To find it, go to the designer in your workspace. In the **New pipeline** section, select **Sample 1 - Regression: Automobile Price Prediction(Basic)**.
37
37
38
38
## Create a new pipeline
39
39
40
40
Azure Machine Learning pipelines organize multiple, dependent machine learning and data processing steps into a single resource. Pipelines help you organize, manage, and reuse complex machine learning workflows across projects and users. To create an Azure Machine Learning pipeline, you need an Azure Machine Learning workspace. In this section, you learn how to create both these resources.
41
41
42
42
### Create a new workspace
43
43
44
-
If you have an Azure Machine Learning workspace with an **Enterprise edition**, [skip to the next section](#create-the-pipeline).
44
+
If you have an Azure Machine Learning workspace with an Enterprise edition, [skip to the next section](#create-the-pipeline).
1. Sign into [ml.azure.com](https://ml.azure.com) and select the workspace you want to work with.
50
+
1. Sign in to [ml.azure.com](https://ml.azure.com), and select the workspace you want to work with.
51
51
52
52
1. Select **Designer**.
53
53
54
54

55
55
56
56
1. Select **Easy-to-use prebuilt modules**.
57
57
58
-
1. Select the default pipeline name,**"Pipeline-Created-on ..."** at the top of the canvas, and rename it to something meaningful. For example, **"Automobile price prediction"**. The name doesn't need to be unique.
58
+
1. Select the default pipeline name **Pipeline-Created-on** at the top of the canvas. Rename it to something meaningful. An example is *Automobile price prediction*. The name doesn't need to be unique.
59
59
60
60
## Import data
61
61
62
62
There are several sample datasets included in the designer for you to experiment with. For this tutorial, use **Automobile price data (Raw)**.
63
63
64
-
1. To the left of the pipeline canvas is a palette of datasets and modules. Select **Datasets** then view the **Samples** section to view the available sample datasets.
64
+
1. To the left of the pipeline canvas is a palette of datasets and modules. Select **Datasets**, and then view the **Samples** section to view the available sample datasets.
65
65
66
-
1. Select the dataset,**Automobile price data (Raw)**, and drag it onto the canvas.
66
+
1. Select the dataset **Automobile price data (Raw)**, and drag it onto the canvas.
67
67
68
68

69
69
70
70
### Visualize the data
71
71
72
-
You can visualize the data to understand the dataset you will be using.
72
+
You can visualize the data to understand the dataset that you'll use.
73
73
74
74
1. Select the **Automobile price data (Raw)** module.
75
75
76
-
1. In the **Properties** pane to the right of the canvas, select **Outputs**.
76
+
1. In the properties pane to the right of the canvas, select **Outputs**.
77
77
78
78
1. Select the graph icon to visualize the data.
79
79
@@ -85,17 +85,17 @@ You can visualize the data to understand the dataset you will be using.
85
85
86
86
## Prepare data
87
87
88
-
Datasets typically require some preprocessing before analysis. You might have noticed some missing values when inspect the dataset. These missing values need to be cleaned so that the model can analyze the data correctly.
88
+
Datasets typically require some preprocessing before analysis. You might have noticed some missing values when you inspected the dataset. These missing values must be cleaned so that the model can analyze the data correctly.
89
89
90
90
### Remove a column
91
91
92
-
When you train a model, you have to do something about the data that's missing. In this dataset, the **normalized-losses** column is missing many values, so you'll exclude that column from the model altogether.
92
+
When you train a model, you have to do something about the data that's missing. In this dataset, the **normalized-losses** column is missing many values, so you exclude that column from the model altogether.
93
93
94
-
1. Enter **Select** in the Search box at the top of the palette to find the **Select Columns in Dataset** module.
94
+
1. Enter **Select** in the search box at the top of the palette to find the **Select Columns in Dataset** module.
95
95
96
-
1.Click and drag the **Select Columns in Dataset** module onto the canvas. Drop the module below the dataset module.
96
+
1.Drag the **Select Columns in Dataset** module onto the canvas. Drop the module below the dataset module.
97
97
98
-
1. Connect the **Automobile price data (Raw)** dataset to the **Select Columns in Dataset**. Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of **Select Columns in Dataset**, which is the small circle at the top of the module.
98
+
1. Connect the **Automobile price data (Raw)** dataset to the **Select Columns in Dataset** module. Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of **Select Columns in Dataset**, which is the small circle at the top of the module.
99
99
100
100
> [!TIP]
101
101
> You create a flow of data through your pipeline when you connect the output port of one module to an input port of another.
@@ -105,13 +105,13 @@ When you train a model, you have to do something about the data that's missing.
105
105
106
106
1. Select the **Select Columns in Dataset** module.
107
107
108
-
1. In the **Properties** pane to the right of the canvas, select **Parameters** > **Edit column**.
108
+
1. In the properties pane to the right of the canvas, select **Parameters** > **Edit column**.
109
109
110
110
1. Select the **+** to add a new rule.
111
111
112
112
1. From the drop-down menu, select **Exclude** and **Column names**.
113
113
114
-
1. Enter **normalized-losses** into the text box.
114
+
1. Enter *normalized-losses* in the text box.
115
115
116
116
1. In the lower right, select **Save** to close the column selector.
117
117
@@ -121,50 +121,50 @@ When you train a model, you have to do something about the data that's missing.
121
121
122
122
1. Select the **Select Columns in Dataset** module.
123
123
124
-
1. In the **Properties** pane, select **Parameters** > **Comment** and enter "Exclude normalized losses.".
124
+
1. In the properties pane, select **Parameters** > **Comment** and enter *Exclude normalized losses*.
125
125
126
126
### Clean missing data
127
127
128
-
Your dataset still has missing values after removing the **normalized-losses** column. You can remove the remaining missing data using the **Clean Missing Data** module.
128
+
Your dataset still has missing values after you remove the **normalized-losses** column. You can remove the remaining missing data by using the **Clean Missing Data** module.
129
129
130
130
> [!TIP]
131
131
> Cleaning the missing values from input data is a prerequisite for using most of the modules in the designer.
132
132
133
-
1. Enter **Clean** in the Search box to find the **Clean Missing Data** module.
133
+
1. Enter **Clean** in the search box to find the **Clean Missing Data** module.
134
134
135
-
1. Drag the **Clean Missing Data** module to the pipeline canvas and connect it to the **Select Columns in Dataset** module.
135
+
1. Drag the **Clean Missing Data** module to the pipeline canvas. Connect it to the **Select Columns in Dataset** module.
136
136
137
-
1. In the Properties pane, select **Remove entire row** under **Cleaning mode**.
137
+
1. In the properties pane, select **Remove entire row** under **Cleaning mode**.
138
138
139
-
1. In the Properties pane **Comment** box, enter "Remove missing value rows."
139
+
1. In the properties pane **Comment** box, enter *Remove missing value rows*.
140
140
141
141
Your pipeline should now look something like this:
Now that the data is processed, you can train a predictive model.
148
148
149
149
### Select an algorithm
150
150
151
-
**Classification** and **regression** are two types of supervised machine learning algorithms. **Classification** predicts an answer from a defined set of categories, such as a color (red, blue, or green). **Regression** is used to predict a number.
151
+
*Classification* and *regression* are two types of supervised machine learning algorithms. Classification predicts an answer from a defined set of categories, such as a color like red, blue, or green. Regression is used to predict a number.
152
152
153
-
Since you want to predict price, which is a number, you can use a regression algorithm. For this example, you'll use a linear regression model.
153
+
Because you want to predict price, which is a number, you can use a regression algorithm. For this example, you use a linear regression model.
154
154
155
155
### Split the data
156
156
157
157
Split your data into two separate datasets for training the model and testing it.
158
158
159
-
1. Enter **split data** in the search box to find the **Split Data** module and connect it to the left port of the **Clean Missing Data** module.
159
+
1. Enter **split data** in the search box to find the **Split Data** module. Connect it to the left port of the **Clean Missing Data** module.
160
160
161
161
1. Select the **Split Data** module.
162
162
163
-
1. In the Properties pane, set the **Fraction of rows in the first output dataset** to 0.7.
163
+
1. In the properties pane, set the **Fraction of rows in the first output dataset** to 0.7.
164
164
165
-
This splits 70 percent of the data to train the model and 30 percent for testing it.
165
+
This option splits 70 percent of the data to train the model and 30 percent for testing it.
166
166
167
-
1. In the Properties **Comment** box, enter "Split the dataset into training set (0.7) and test set (0.3)."
167
+
1. In the properties pane **Comment** box, enter *Split the dataset into training set (0.7) and test set (0.3)*.
168
168
169
169
### Train the model
170
170
@@ -174,9 +174,9 @@ Train the model by giving it a set of data that includes the price. The model sc
174
174
175
175
1. Expand **Machine Learning Algorithms**.
176
176
177
-
This displays several categories of modules that you can use to initialize learning algorithms.
177
+
This option displays several categories of modules that you can use to initialize learning algorithms.
178
178
179
-
1. Select **Regression** > **Linear Regression** and drag it to the pipeline canvas.
179
+
1. Select **Regression** > **Linear Regression**, and drag it to the pipeline canvas.
180
180
181
181
1. Find and drag the **Train Model** module to the pipeline canvas.
182
182
@@ -188,25 +188,25 @@ Train the model by giving it a set of data that includes the price. The model sc
188
188
189
189
1. Select the **Train Model** module.
190
190
191
-
1. In the Properties pane, select **Edit column** selector.
191
+
1. In the properties pane, select **Edit column** selector.
192
192
193
-
1. In the **Label column** dialog, expand the drop-down menu and select **Column names**.
193
+
1. In the **Label column** dialog box, expand the drop-down menu and select **Column names**.
194
194
195
-
1. In the text box, enter **price**. Price is the value that your model is going to predict.
195
+
1. In the text box, enter *price*. Price is the value that your model is going to predict.
196
196
197
197
Your pipeline should look like this:
198
198
199
199

200
200
201
201
## Evaluate a machine learning model
202
202
203
-
After training your model using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.
203
+
After you train your model by using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.
204
204
205
-
1. Enter **score model** in the search box to find the **Score Model** module and drag the module to the pipeline canvas.
205
+
1. Enter *score model* in the search box to find the **Score Model** module. Drag the module to the pipeline canvas.
206
206
207
207
1. Connect the output of the **Train Model** module to the left input port of **Score Model**. Connect the test data output (right port) of the **Split Data** module to the right input port of **Score Model**.
208
208
209
-
1. Enter **evaluate** in the search box to find the **Evaluate Model**and drag the module to the pipeline canvas.
209
+
1. Enter *evaluate* in the search box to find the **Evaluate Model**module. Drag the module to the pipeline canvas.
210
210
211
211
1. Connect the output of the **Score Model** module to the left input of **Evaluate Model**.
212
212
@@ -224,23 +224,23 @@ After the run completes, you can view the results of the pipeline run.
224
224
225
225
1. Select the **Score Model** module to view its output.
226
226
227
-
1. In the **Properties** pane, select **Outputs** > **Visualize**.
227
+
1. In the properties pane, select **Outputs** > **Visualize**.
228
228
229
229
Here you can see the predicted prices and the actual prices from the testing data.
230
230
231
-

231
+

232
232
233
233
1. Select the **Evaluate Model** module to view its output.
234
234
235
-
1. In the **Properties** pane, select **Output** > **Visualize**.
235
+
1. In the properties pane, select **Output** > **Visualize**.
236
236
237
237
The following statistics are shown for your model:
238
238
239
-
***Mean Absolute Error (MAE)**: The average of absolute errors (an error is the difference between the predicted value and the actual value).
239
+
***Mean Absolute Error (MAE)**: The average of absolute errors. An error is the difference between the predicted value and the actual value.
240
240
***Root Mean Squared Error (RMSE)**: The square root of the average of squared errors of predictions made on the test dataset.
241
241
***Relative Absolute Error**: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
242
242
***Relative Squared Error**: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
243
-
***Coefficient of Determination**: Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.
243
+
***Coefficient of Determination**: Also known as the R squared value, this statistical metric indicates how well a model fits the data.
244
244
245
245
For each of the error statistics, smaller is better. A smaller value indicates that the predictions are closer to the actual values. For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.
0 commit comments