Skip to content

Commit 6ad3dd3

Browse files
authored
Merge pull request #106631 from ShawnJackson/five-machine-learning-articles
edit pass: Five machine learning articles
2 parents daa5af0 + 2801f74 commit 6ad3dd3

File tree

5 files changed

+257
-271
lines changed

5 files changed

+257
-271
lines changed
Lines changed: 67 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "Create Python Model: Module Reference"
2+
title: "Create Python Model: Module reference"
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to use the Create Python Model model in Azure Machine Learning to create custom modeling or data processing module.
4+
description: Learn how to use the Create Python Model module in Azure Machine Learning to create a custom modeling or data processing module.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: core
@@ -11,104 +11,106 @@ author: likebupt
1111
ms.author: keli19
1212
ms.date: 11/19/2019
1313
---
14-
# Create Python Model
14+
# Create Python Model module
1515

1616
This article describes a module in Azure Machine Learning designer (preview).
1717

18-
Learn how to use the **Create Python Model** module to create an untrained model from a Python script. You can base the model on any learner that is included in a Python package in the Azure Machine Learning designer environment.
18+
Learn how to use the Create Python Model module to create an untrained model from a Python script. You can base the model on any learner that's included in a Python package in the Azure Machine Learning designer environment.
1919

20-
After you create the model, you can use [Train Model](train-model.md) to train the model on a dataset, like any other learner in Azure Machine Learning. The trained model can be passed to [Score Model](score-model.md) to use the model to make predictions. The trained model can then be saved, and the scoring workflow can be published as a web service.
20+
After you create the model, you can use [Train Model](train-model.md) to train the model on a dataset, like any other learner in Azure Machine Learning. The trained model can be passed to [Score Model](score-model.md) to make predictions. You can then save the trained model and publish the scoring workflow as a web service.
2121

2222
> [!WARNING]
23-
> Currently it is not possible to pass the scored results of a Python model to [Evaluate Model](evaluate-model.md). If you need to evaluate a model, you can write custom Python script and run it using the [Execute Python Script](execute-python-script.md) module.
23+
> Currently, it's not possible to pass the scored results of a Python model to [Evaluate Model](evaluate-model.md). If you need to evaluate a model, you can write a custom Python script and run it by using the [Execute Python Script](execute-python-script.md) module.
2424
2525

26-
## How to configure Create Python Model
26+
## Configure the module
2727

28-
Use of this module requires intermediate or expert knowledge of Python. The module supports use of any learner that is included in the Python packages already installed in Azure Machine Learning. See pre-installed Python package list in [Execute Python Script](execute-python-script.md).
28+
Use of this module requires intermediate or expert knowledge of Python. The module supports use of any learner that's included in the Python packages already installed in Azure Machine Learning. See the preinstalled Python package list in [Execute Python Script](execute-python-script.md).
2929

3030

31-
This article will show how to use the **Create Python Model** with a simple pipeline. Below is the graph of the pipeline.
31+
This article shows how to use Create Python Model with a simple pipeline. Here's a diagram of the pipeline:
3232

33-
![create-python-model](./media/module/create-python-model.png)
33+
![Diagram of Create Python Model](./media/module/create-python-model.png)
3434

35-
1. Click **Create Python Model**, edit the script to implement your modeling or data management process. You can base the model on any learner that is included in a Python package in the Azure Machine Learning environment.
35+
1. Select **Create Python Model**, and edit the script to implement your modeling or data management process. You can base the model on any learner that's included in a Python package in the Azure Machine Learning environment.
3636

37+
The following sample code of the two-class Naive Bayes classifier uses the popular *sklearn* package:
3738

38-
Below is a sample code of two-class Naive Bayes classifier by using the popular *sklearn* package.
39+
```Python
3940

40-
```Python
41+
# The script MUST define a class named AzureMLModel.
42+
# This class MUST at least define the following three methods:
43+
# __init__: in which self.model must be assigned,
44+
# train: which trains self.model, the two input arguments must be pandas DataFrame,
45+
# predict: which generates prediction result, the input argument and the prediction result MUST be pandas DataFrame.
46+
# The signatures (method names and argument names) of all these methods MUST be exactly the same as the following example.
4147

42-
# The script MUST define a class named AzureMLModel.
43-
# This class MUST at least define the following three methods:
44-
# __init__: in which self.model must be assigned,
45-
# train: which trains self.model, the two input arguments must be pandas DataFrame,
46-
# predict: which generates prediction result, the input argument and the prediction result MUST be pandas DataFrame.
47-
# The signatures (method names and argument names) of all these methods MUST be exactly the same as the following example.
4848

49+
import pandas as pd
50+
from sklearn.naive_bayes import GaussianNB
4951

50-
import pandas as pd
51-
from sklearn.naive_bayes import GaussianNB
5252

53+
class AzureMLModel:
54+
def __init__(self):
55+
self.model = GaussianNB()
56+
self.feature_column_names = list()
5357

54-
class AzureMLModel:
55-
def __init__(self):
56-
self.model = GaussianNB()
57-
self.feature_column_names = list()
58+
def train(self, df_train, df_label):
59+
self.feature_column_names = df_train.columns.tolist()
60+
self.model.fit(df_train, df_label)
5861

59-
def train(self, df_train, df_label):
60-
self.feature_column_names = df_train.columns.tolist()
61-
self.model.fit(df_train, df_label)
62+
def predict(self, df):
63+
return pd.DataFrame(
64+
{'Scored Labels': self.model.predict(df[self.feature_column_names]),
65+
'probabilities': self.model.predict_proba(df[self.feature_column_names])[:, 1]}
66+
)
6267

63-
def predict(self, df):
64-
return pd.DataFrame(
65-
{'Scored Labels': self.model.predict(df[self.feature_column_names]),
66-
'probabilities': self.model.predict_proba(df[self.feature_column_names])[:, 1]}
67-
)
6868

69+
```
6970

70-
```
71+
1. Connect the Create Python Model module that you just created to Train Model and Score Model.
7172

73+
1. If you need to evaluate the model, add an [Execute Python Script](execute-python-script.md) module and edit the Python script.
7274

73-
2. Connect the **Create Python Model** module you just created to a **Train Model** and **Score Model**
75+
The following script is sample evaluation code:
7476

75-
3. If you need to evaluate the model, add a [Execute Python Script](execute-python-script.md) and edit the Python script to implement evaluation.
77+
```Python
7678

77-
Below is sample evaluation code.
7879

79-
```Python
80+
# The script MUST contain a function named azureml_main
81+
# which is the entry point for this module.
8082

83+
# imports up here can be used to
84+
import pandas as pd
8185

82-
# The script MUST contain a function named azureml_main
83-
# which is the entry point for this module.
84-
85-
# imports up here can be used to
86-
import pandas as pd
87-
88-
# The entry point function can contain up to two input arguments:
89-
# Param<dataframe1>: a pandas.DataFrame
90-
# Param<dataframe2>: a pandas.DataFrame
91-
def azureml_main(dataframe1 = None, dataframe2 = None):
86+
# The entry point function can contain up to two input arguments:
87+
# Param<dataframe1>: a pandas.DataFrame
88+
# Param<dataframe2>: a pandas.DataFrame
89+
def azureml_main(dataframe1 = None, dataframe2 = None):
9290

93-
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score, roc_curve
94-
import pandas as pd
95-
import numpy as np
91+
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score, roc_curve
92+
import pandas as pd
93+
import numpy as np
9694

97-
scores = dataframe1.ix[:, ("income", "Scored Labels", "probabilities")]
98-
ytrue = np.array([0 if val == '<=50K' else 1 for val in scores["income"]])
99-
ypred = np.array([0 if val == '<=50K' else 1 for val in scores["Scored Labels"]])
100-
probabilities = scores["probabilities"]
95+
scores = dataframe1.ix[:, ("income", "Scored Labels", "probabilities")]
96+
ytrue = np.array([0 if val == '<=50K' else 1 for val in scores["income"]])
97+
ypred = np.array([0 if val == '<=50K' else 1 for val in scores["Scored Labels"]])
98+
probabilities = scores["probabilities"]
10199

102-
accuracy, precision, recall, auc = \
103-
accuracy_score(ytrue, ypred),\
104-
precision_score(ytrue, ypred),\
105-
recall_score(ytrue, ypred),\
106-
roc_auc_score(ytrue, probabilities)
100+
accuracy, precision, recall, auc = \
101+
accuracy_score(ytrue, ypred),\
102+
precision_score(ytrue, ypred),\
103+
recall_score(ytrue, ypred),\
104+
roc_auc_score(ytrue, probabilities)
107105

108-
metrics = pd.DataFrame();
109-
metrics["Metric"] = ["Accuracy", "Precision", "Recall", "AUC"];
110-
metrics["Value"] = [accuracy, precision, recall, auc]
106+
metrics = pd.DataFrame();
107+
metrics["Metric"] = ["Accuracy", "Precision", "Recall", "AUC"];
108+
metrics["Value"] = [accuracy, precision, recall, auc]
111109

112-
return metrics,
110+
return metrics,
111+
112+
```
113+
114+
## Next steps
113115

114-
```
116+
See the [set of modules available](module-reference.md) to Azure Machine Learning.
Lines changed: 49 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Enter Data Manually: Module Reference"
2+
title: "Enter Data Manually: Module reference"
33
titleSuffix: Azure Machine Learning
44
description: Learn how to use the Enter Data Manually module in Azure Machine Learning to create a small dataset by typing values. The dataset can have multiple columns.
55
services: machine-learning
@@ -15,84 +15,80 @@ ms.date: 02/22/2020
1515

1616
This article describes a module in Azure Machine Learning designer (preview).
1717

18-
Use this module to create a small dataset by typing values. The dataset can have multiple columns.
18+
Use the Enter Data Manually module to create a small dataset by typing values. The dataset can have multiple columns.
1919

20-
This module can be helpful in scenarios such as these:
20+
This module can be helpful in scenarios such as:
2121

22-
- Generating a small set of values for testing
23-
24-
- Creating a short list of labels
25-
26-
- Typing a list of column names to insert in a dataset
22+
- Generating a small set of values for testing.
23+
- Creating a short list of labels.
24+
- Typing a list of column names to insert in a dataset.
2725

28-
## Enter Data Manually
29-
30-
1. Add the [Enter Data Manually](./enter-data-manually.md) module to your pipeline. You can find this module in the **Data Input and Output** category in Azure Machine Learning.
31-
32-
2. For **DataFormat**, select one of the following options. These options determine how the data that you provide should be parsed. The requirements for each format differ greatly, so be sure to read the related topics.
26+
## Create a dataset
3327

34-
- **ARFF**: The attribute-relation file format, used by Weka.
28+
1. Add the [Enter Data Manually](./enter-data-manually.md) module to your pipeline. You can find this module in the **Data Input and Output** category in Azure Machine Learning.
3529

36-
- **CSV**: Comma-separated values format. For more information, see [Convert to CSV](./convert-to-csv.md).
30+
1. For **DataFormat**, select one of the following options. These options determine how the data that you provide should be parsed. The requirements for each format differ greatly, so be sure to read the related topics.
3731

38-
- **SVMLight**: A format used by Vowpal Wabbit and other machine learning frameworks.
39-
40-
- **TSV**: Tab-separated values format.
32+
- **ARFF**: Attribute-relation file format used by Weka.
33+
- **CSV**: Comma-separated values format. For more information, see [Convert to CSV](./convert-to-csv.md).
34+
- **SVMLight**: Format used by Vowpal Wabbit and other machine learning frameworks.
35+
- **TSV**: Tab-separated values format.
4136

42-
If you choose a format and do not provide data that meets the format specifications, a run-time error occurs.
37+
If you choose a format and do not provide data that meets the format specifications, a runtime error occurs.
4338

44-
3. Click inside the **Data** text box to start entering data. The following formats require special attention:
39+
1. Click inside the **Data** text box to start entering data. The following formats require special attention:
4540

46-
- **CSV**: To create multiple columns, paste in comma-separated text, or type multiple columns using commas between fields.
41+
- **CSV**: To create multiple columns, paste in comma-separated text, or type multiple columns by using commas between fields.
4742

48-
If you select the **HasHeader** option, you can use the first row of values as the column heading.
43+
If you select the **HasHeader** option, you can use the first row of values as the column heading.
4944

50-
If you deselect this option, the columns names, Col1, Col2, and so forth, are used. You can add or change columns names later using [Edit Metadata](./edit-metadata.md).
45+
If you deselect this option, the column names (Col1, Col2, and so forth) are used. You can add or change columns names later by using [Edit Metadata](./edit-metadata.md).
5146

52-
- **TSV**: To create multiple columns, paste in tab-separated text, or type multiple columns using tabs between fields.
47+
- **TSV**: To create multiple columns, paste in tab-separated text, or type multiple columns by using tabs between fields.
5348

54-
If you select the **HasHeader** option, you can use the first row of values as the column heading.
49+
If you select the **HasHeader** option, you can use the first row of values as the column heading.
5550

56-
If you deselect this option, the columns names, Col1, Col2, and so forth, are used. You can add or change columns names later using [Edit Metadata](./edit-metadata.md).
51+
If you deselect this option, the column names (Col1, Col2, and so forth) are used. You can add or change columns names later by using [Edit Metadata](./edit-metadata.md).
5752

58-
- **ARFF**: Paste in an existing ARFF format file. If you are typing values directly, be sure to add the optional header and required attribute fields at the beginning of the data.
59-
60-
For example, the following header and attribute rows could be added to a simple list. The column heading would be `SampleText`. Note that String type is not supported.
53+
- **ARFF**: Paste in an existing ARFF format file. If you're typing values directly, be sure to add the optional header and required attribute fields at the beginning of the data.
54+
55+
For example, the following header and attribute rows can be added to a simple list. The column heading would be `SampleText`. Note that the String type is not supported.
6156

62-
```text
63-
% Title: SampleText.ARFF
64-
% Source: Enter Data module
65-
@ATTRIBUTE SampleText NUMERIC
66-
@DATA
67-
\<type first data row here>
68-
```
57+
```text
58+
% Title: SampleText.ARFF
59+
% Source: Enter Data module
60+
@ATTRIBUTE SampleText NUMERIC
61+
@DATA
62+
\<type first data row here>
63+
```
6964
70-
- **SVMLight**: Type or paste in values using the SVMLight format.
65+
- **SVMLight**: Type or paste in values by using the SVMLight format.
7166
72-
For example, the following sample represents the first couple lines of the Blood Donation dataset, in SVMight format:
67+
For example, the following sample represents the first couple of lines of the Blood Donation dataset, in SVMLight format:
7368
74-
```text
75-
# features are [Recency], [Frequency], [Monetary], [Time]
76-
1 1:2 2:50 3:12500 4:98
77-
1 1:0 2:13 3:3250 4:28
78-
```
69+
```text
70+
# features are [Recency], [Frequency], [Monetary], [Time]
71+
1 1:2 2:50 3:12500 4:98
72+
1 1:0 2:13 3:3250 4:28
73+
```
7974
80-
When you run the [Enter Data Manually](./enter-data-manually.md) module, these lines are converted to a dataset of columns and index values as follows:
75+
When you run the [Enter Data Manually](./enter-data-manually.md) module, these lines are converted to a dataset of columns and index values as follows:
8176
82-
|Col1|Col2|Col3|Col4|Labels|
83-
|-|-|-|-|-|
84-
|0.00016|0.004|0.999961|0.00784|1|
85-
|0|0.004|0.999955|0.008615|1|
77+
|Col1|Col2|Col3|Col4|Labels|
78+
|-|-|-|-|-|
79+
|0.00016|0.004|0.999961|0.00784|1|
80+
|0|0.004|0.999955|0.008615|1|
8681
87-
4. Press ENTER after each row, to start a new line.
82+
1. Select the Enter key after each row, to start a new line.
8883
89-
If you press ENTER multiple times to add multiple empty trailing rows, the empty rows will be removed trimmed.
84+
If you select Enter multiple times to add multiple empty trailing rows, the empty rows will be removed or trimmed.
9085
91-
If you create rows with missing values, you can always filter them out later.
86+
If you create rows with missing values, you can always filter them out later.
9287
93-
5. Connect the output port to other modules, and run the pipeline.
88+
1. Connect the output port to other modules, and run the pipeline.
9489
95-
To view the dataset, right-click the module and select **Visualize**.
90+
To view the dataset, right-click the module and select **Visualize**.
91+
9692
## Next steps
9793
9894
See the [set of modules available](module-reference.md) to Azure Machine Learning.

0 commit comments

Comments
 (0)