Skip to content

Commit a2e8acd

Browse files
authored
Merge pull request #103931 from likebupt/docsbugbash-update
fix some typo or description errors
2 parents 7132b06 + 2393b60 commit a2e8acd

12 files changed

+51
-62
lines changed

articles/machine-learning/algorithm-module-reference/apply-transformation.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 10/22/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414

1515
# Apply Transformation module
@@ -28,9 +28,9 @@ Azure Machine Learning provides support for creating and then applying many diff
2828

2929
## How to use Apply Transformation
3030

31-
1. Add the **Apply Transformation** module to your pipeline. You can find this module under **Machine Learning**, in the **Score** category.
31+
1. Add the **Apply Transformation** module to your pipeline. You can find this module in the **Model Scoring & Evaluation** category.
3232

33-
2. Locate an existing transformation to use as an input. Previously saved transformations can be found in the **Transforms** group in the left navigation pane.
33+
2. Locate an existing transformation to use as an input. Previously saved transformations can be found in the **My Datasets** group under **Datasets** category in the left module tree.
3434

3535

3636

articles/machine-learning/algorithm-module-reference/clean-missing-data.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 10/22/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414

1515
# Clean Missing Data module
@@ -20,7 +20,7 @@ Use this module to remove, replace, or infer missing values.
2020

2121
Data scientists often check data for missing values and then perform various operations to fix the data or insert new values. The goal of such cleaning operations is to prevent problems caused by missing data that can arise when training a model.
2222

23-
This module supports multiple type of operations for "cleaning" missing values, including:
23+
This module supports multiple types of operations for "cleaning" missing values, including:
2424

2525
+ Replacing missing values with a placeholder, mean, or other value
2626
+ Completely removing rows and columns that have missing values
@@ -33,11 +33,11 @@ This module also outputs a definition of the transformation used to clean the mi
3333

3434
## How to use Clean Missing Data
3535

36-
This module lets you define a cleaning operation. You can also save the cleaning operation so that you can apply it later to new data. See the following links for a description of how to create and save a cleaning process:
36+
This module lets you define a cleaning operation. You can also save the cleaning operation so that you can apply it later to new data. See the following sections of how to create and save a cleaning process:
3737

38-
+ To replace missing values
38+
+ [To replace missing values](#replace-missing-values)
3939

40-
+ To apply a cleaning transformation to new data
40+
+ [To apply a cleaning transformation to new data](#apply-a-saved-cleaning-operation-to-new-data)
4141

4242
> [!IMPORTANT]
4343
> The cleaning method that you use for handling missing values can dramatically affect your results. We recommend that you experiment with different methods. Consider both the justification for use of a particular method, and the quality of the results.
@@ -52,12 +52,9 @@ Each time that you apply the [Clean Missing Data](./clean-missing-data.md) modu
5252

5353
For example, to check for missing values in all numeric columns:
5454

55-
1. Open the Column Selector, and select **WITH RULES**.
56-
2. For **BEGIN WITH**, select **NO COLUMNS**.
55+
1. Select the **Clean Missing Data** module, and click on **Edit column** in the right panel of the module.
5756

58-
You can also start with ALL COLUMNS and then exclude columns. Initially, rules are not shown if you first click **ALL COLUMNS**, but you can click **NO COLUMNS** and then click **ALL COLUMNS** again to start with all columns and then filter out (exclude) columns based on the name, data type, or columns index.
59-
60-
3. For **Include**, select **Column type** from the dropdown list, and then select **Numeric**, or a more specific numeric type.
57+
3. For **Include**, select **Column types** from the dropdown list, and then select **Numeric**.
6158

6259
Any cleaning or replacement method that you choose must be applicable to **all** columns in the selection. If the data in any column is incompatible with the specified operation, the module returns an error and stops the pipeline.
6360

@@ -105,7 +102,7 @@ Each time that you apply the [Clean Missing Data](./clean-missing-data.md) modu
105102

106103
6. The option **Replacement value** is available if you have selected the option, **Custom substitution value**. Type a new value to use as the replacement value for all missing values in the column.
107104

108-
Note that you can use this option only in columns that have the Integer, Double, Boolean, or Date data types. For date columns, the replacement value can also be entered as the number of 100-nanosecond ticks since 1/1/0001 12:00 A.M.
105+
Note that you can use this option only in columns that have the Integer, Double, Boolean, or String.
109106

110107
7. **Generate missing value indicator column**: Select this option if you want to output some indication of whether the values in the column met the criteria for missing value cleaning. This option is particularly useful when you are setting up a new cleaning operation and want to make sure it works as designed.
111108

articles/machine-learning/algorithm-module-reference/cross-validate-model.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: reference
99

1010
author: likebupt
1111
ms.author: keli19
12-
ms.date: 10/10/2019
12+
ms.date: 02/11/2020
1313
---
1414
# Cross Validate Model
1515

@@ -57,22 +57,20 @@ In this scenario, you both train and test the model by using Cross Validate Mode
5757

5858
2. Connect the output of any classification or regression model.
5959

60-
For example, if you're using **Two Class Bayes Point Machine** for classification, configure the model with the parameters that you want. Then, drag a connector from the **Untrained model** port of the classifier to the matching port of Cross Validate Model.
60+
For example, if you're using **Two Class Boosted Decision Tree** for classification, configure the model with the parameters that you want. Then, drag a connector from the **Untrained model** port of the classifier to the matching port of Cross Validate Model.
6161

6262
> [!TIP]
6363
> You don't have to train the model, because Cross-Validate Model automatically trains the model as part of evaluation.
6464
3. On the **Dataset** port of Cross Validate Model, connect any labeled training dataset.
6565

66-
4. In the **Properties** pane of Cross Validate Model, select **Launch column selector**. Choose the single column that contains the class label, or the predictable value.
66+
4. In the right panel of Cross Validate Model, click **Edit column**. Select the single column that contains the class label, or the predictable value.
6767

6868
5. Set a value for the **Random seed** parameter if you want to repeat the results of cross-validation across successive runs on the same data.
6969

7070
6. Run the pipeline.
7171

7272
7. See the [Results](#results) section for a description of the reports.
7373

74-
To get a copy of the model for reuse later, switch to the **Outputs** tab in the right panel of the module that contains the algorithm (for example, the **Two Class Bayes Point Machine**). Then select the **Register dataset** icon to save a copy of the trained model in the module tree.
75-
7674
## Results
7775

7876
After all iterations are complete, Cross Validate Model creates scores for the entire dataset. It also creates performance metrics that you can use to assess the quality of the model.

articles/machine-learning/algorithm-module-reference/edit-metadata.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 10/22/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414
# Edit Metadata module
1515

@@ -35,9 +35,9 @@ Typical metadata changes might include:
3535

3636
## Configure Edit Metadata
3737

38-
1. In Azure Machine Learning, add the Edit Metadata module to your pipeline and connect the dataset you want to update. You can find the dataset under **Data Transformation** in the **Manipulate** category.
38+
1. In Azure Machine Learning designer, add the Edit Metadata module to your pipeline and connect the dataset you want to update. You can find the module in the **Data Transformation** category.
3939

40-
1. Select **Launch the column selector** and choose the column or set of columns to work with. You can choose columns individually by name or index, or you can choose a group of columns by type.
40+
1. Click **Edit column** in the right panel of the module and choose the column or set of columns to work with. You can choose columns individually by name or index, or you can choose a group of columns by type.
4141

4242
1. Select the **Data type** option if you need to assign a different data type to the selected columns. You might need to change the data type for certain operations. For example, if your source dataset has numbers handled as text, you must change them to a numeric data type before using math operations.
4343

articles/machine-learning/algorithm-module-reference/evaluate-model.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 11/19/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414
# Evaluate Model module
1515

@@ -75,10 +75,10 @@ Because this is a clustering model, the evaluation results are different than if
7575

7676
This section describes the metrics returned for the specific types of models supported for use with **Evaluate Model**:
7777

78-
+ [classification models](#bkmk_classification)
79-
+ [regression models](#bkmk_regression)
78+
+ [classification models](#metrics-for-classification-models)
79+
+ [regression models](#metrics-for-regression-models)
8080

81-
### <a name="bkmk_classification"></a> Metrics for classification models
81+
### Metrics for classification models
8282

8383
The following metrics are reported when evaluating classification models. If you compare models, they are ranked by the metric you select for evaluation.
8484

@@ -96,7 +96,7 @@ The following metrics are reported when evaluating classification models. If you
9696

9797
- **Training log loss** is a single score that represents the advantage of the classifier over a random prediction. The log loss measures the uncertainty of your model by comparing the probabilities it outputs to the known values (ground truth) in the labels. You want to minimize log loss for the model as a whole.
9898

99-
## <a name="bkmk_regression"></a> Metrics for regression models
99+
### Metrics for regression models
100100

101101
The metrics returned for regression models are designed to estimate the amount of error. A model is considered to fit the data well if the difference between observed and predicted values is small. However, looking at the pattern of the residuals (the difference between any one predicted point and its corresponding actual value) can tell you a lot about potential bias in the model.
102102

@@ -110,7 +110,7 @@ The metrics returned for regression models are designed to estimate the amount o
110110

111111
- **Relative squared error (RSE)** similarly normalizes the total squared error of the predicted values by dividing by the total squared error of the actual values.
112112

113-
- **Mean Zero One Error (MZOE)** indicates whether the prediction was correct or not. In other words: `ZeroOneLoss(x,y) = 1` when `x!=y`; otherwise `0`.
113+
114114

115115
- **Coefficient of determination**, often referred to as R<sup>2</sup>, represents the predictive power of the model as a value between 0 and 1. Zero means the model is random (explains nothing); 1 means there is a perfect fit. However, caution should be used in interpreting R<sup>2</sup> values, as low values can be entirely normal and high values can be suspect.
116116

articles/machine-learning/algorithm-module-reference/score-model.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 10/22/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414
# Score Model module
1515

@@ -39,7 +39,7 @@ The score, or predicted value, can be in many different formats, depending on th
3939

4040
- For classification models, [Score Model](./score-model.md) outputs a predicted value for the class, as well as the probability of the predicted value.
4141
- For regression models, [Score Model](./score-model.md) generates just the predicted numeric value.
42-
- For image classification models, the score might be the class of object in the image, or a Boolean indicating whether a particular feature was found.
42+
4343

4444
## Publish scores as a web service
4545

articles/machine-learning/algorithm-module-reference/train-model.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: reference
99

10-
author: xiaoharper
11-
ms.author: zhanxia
12-
ms.date: 10/22/2019
10+
author: likebupt
11+
ms.author: keli19
12+
ms.date: 02/11/2020
1313
---
1414
# Train Model module
1515

@@ -34,7 +34,7 @@ In Azure Machine Learning, creating and using a machine learning model is typica
3434

3535
3. After training is completed, use the trained model with one of the [scoring modules](./score-model.md), to make predictions on new data.
3636

37-
## How to use **Train Model**
37+
## How to use Train Model
3838

3939
1. In Azure Machine Learning, configure a classification model or regression model.
4040

@@ -44,7 +44,7 @@ In Azure Machine Learning, creating and using a machine learning model is typica
4444

4545
The training dataset must contain a label column. Any rows without labels are ignored.
4646

47-
4. For **Label column**, click **Launch column selector**, and choose a single column that contains outcomes the model can use for training.
47+
4. For **Label column**, click **Edit column** in the right panel of module, and choose a single column that contains outcomes the model can use for training.
4848

4949
- For classification problems, the label column must contain either **categorical** values or **discrete** values. Some examples might be a yes/no rating, a disease classification code or name, or an income group. If you pick a noncategorical column, the module will return an error during training.
5050

articles/machine-learning/algorithm-module-reference/tune-model-hyperparameters.md

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: reference
99

1010
author: likebupt
1111
ms.author: keli19
12-
ms.date: 10/16/2019
12+
ms.date: 02/11/2020
1313
---
1414
# Tune Model Hyperparameters
1515

@@ -38,17 +38,13 @@ This section describes how to perform a basic parameter sweep, which trains a mo
3838

3939
2. Connect an untrained model to the leftmost input.
4040

41-
3. Set the **Create trainer mode** option to **Parameter Range**. Use **Range Builder** to specify a range of values to use in the parameter sweep.
4241

43-
Almost all the classification and regression modules support an integrated parameter sweep. For learners that don't support configuring a parameter range, you can test only the available parameter values.
44-
45-
You can manually set the value for one or more parameters, and then sweep over the remaining parameters. This might save some time.
4642

4743
4. Add the dataset that you want to use for training, and connect it to the middle input of Tune Model Hyperparameters.
4844

4945
Optionally, if you have a tagged dataset, you can connect it to the rightmost input port (**Optional validation dataset**). This lets you measure accuracy while training and tuning.
5046

51-
5. In the **Properties** pane of Tune Model Hyperparameters, choose a value for **Parameter sweeping mode**. This option controls how the parameters are selected.
47+
5. In the right panel of Tune Model Hyperparameters, choose a value for **Parameter sweeping mode**. This option controls how the parameters are selected.
5248

5349
- **Entire grid**: When you select this option, the module loops over a grid predefined by the system, to try different combinations and identify the best learner. This option is useful when you don't know what the best parameter settings might be and want to try all possible combinations of values.
5450

@@ -60,8 +56,6 @@ This section describes how to perform a basic parameter sweep, which trains a mo
6056

6157
1. **Maximum number of runs on random sweep**: If you choose a random sweep, you can specify how many times the model should be trained, by using a random combination of parameter values.
6258

63-
2. **Maximum number of runs on random grid**: This option also controls the number of iterations over a random sampling of parameter values, but the values are not generated randomly from the specified range. Instead, the module creates a matrix of all possible combinations of parameter values. It then takes a random sampling over the matrix. This method is more efficient and less prone to regional oversampling or undersampling.
64-
6559
8. For **Ranking**, choose a single metric to use for ranking the models.
6660

6761
When you run a parameter sweep, the module calculates all applicable metrics for the model type and returns them in the **Sweep results** report. The module uses separate metrics for regression and classification models.

articles/machine-learning/how-to-designer-import-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Your registered datasets can be found in the module palette, under **Datasets**
4141

4242
![Screenshot showing location of saved datasets in the designer palette](media/how-to-designer-import-data/use-datasets-designer.png)
4343

44-
Any [file dataset](how-to-create-register-datasets.md#dataset-types) registered to your machine learning workspace will appear in the module palette. You aren't limited to using datasets created in the designer.
44+
4545

4646
> [!NOTE]
4747
> The designer currently only supports processing [tabular datasets](how-to-create-register-datasets.md#dataset-types). If you want to use [file datasets](how-to-create-register-datasets.md#dataset-types), use the Azure Machine Learning SDK available for Python and R.

articles/machine-learning/how-to-designer-sample-regression-automobile-price-basic.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: sample
99
author: likebupt
1010
ms.author: keli19
1111
ms.reviewer: peterlu
12-
ms.date: 12/25/2019
12+
ms.date: 02/11/2020
1313
---
1414
# Use regression to predict car prices with Azure Machine Learning designer
1515

@@ -55,7 +55,7 @@ Use the **Select Columns in Dataset** module to exclude normalized-losses that h
5555

5656
Machine learning problems vary. Common machine learning tasks include classification, clustering, regression, and recommender systems, each of which might require a different algorithm. Your choice of algorithm often depends on the requirements of the use case. After you pick an algorithm, you need to tune its parameters to train a more accurate model. You then need to evaluate all models based on metrics like accuracy, intelligibility, and efficiency.
5757

58-
Since the goal of this sample is to predict automobile prices, and because the label column (price) contains real numbers, a regression model is a good choice. Considering that the number of features is relatively small (less than 100) and these features aren't sparse, the decision boundary is likely to be nonlinear. So we use **Decision Forest Regression** for this pipeline.
58+
Since the goal of this sample is to predict automobile prices, and because the label column (price) is continuous data, a regression model can be a good choice. We use **Linear Regression** for this pipeline.
5959

6060
Use the **Split Data** module to randomly divide the input data so that the training dataset contains 70% of the original data and the testing dataset contains 30% of the original data.
6161

0 commit comments

Comments
 (0)