Skip to content

Commit 2ddf57e

Browse files
authored
Merge pull request #111100 from Bowen-Guo/bowguo/create_python_model
Update doc of module Create Python Model to reduce user error
2 parents 1c63b5b + 1af4b8c commit 2ddf57e

File tree

1 file changed

+17
-2
lines changed

1 file changed

+17
-2
lines changed

articles/machine-learning/algorithm-module-reference/create-python-model.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,22 @@ After you create the model, you can use [Train Model](train-model.md) to train t
2626
## Configure the module
2727

2828
Use of this module requires intermediate or expert knowledge of Python. The module supports use of any learner that's included in the Python packages already installed in Azure Machine Learning. See the preinstalled Python package list in [Execute Python Script](execute-python-script.md).
29-
3029

30+
> [!NOTE]
31+
> Please be very careful when writing your script and makes sure there is no syntax error, such as using a un-declared object or a un-imported module.
32+
33+
> [!NOTE]
34+
Also pay extra attentions to the pre-installed modules list in [Execute Python Script](execute-python-script.md). Only import pre-installed modules. Please do not install extra packages such as "pip install xgboost" in this script, otherwise errors will be raised when reading models in down-stream modules.
35+
3136
This article shows how to use **Create Python Model** with a simple pipeline. Here's a diagram of the pipeline:
3237

3338
![Diagram of Create Python Model](./media/module/create-python-model.png)
3439

3540
1. Select **Create Python Model**, and edit the script to implement your modeling or data management process. You can base the model on any learner that's included in a Python package in the Azure Machine Learning environment.
3641

42+
> [!NOTE]
43+
> Please pay extra attention to the comments in sample code of the script and make sure your script strictly follows the requirement, including the class name, methods as well as method signature. Violation will lead to exceptions.
44+
3745
The following sample code of the two-class Naive Bayes classifier uses the popular *sklearn* package:
3846

3947
```Python
@@ -45,7 +53,9 @@ This article shows how to use **Create Python Model** with a simple pipeline. He
4553
# predict: which generates prediction result, the input argument and the prediction result MUST be pandas DataFrame.
4654
# The signatures (method names and argument names) of all these methods MUST be exactly the same as the following example.
4755

48-
56+
# Please do not install extra packages such as "pip install xgboost" in this script,
57+
# otherwise errors will be raised when reading models in down-stream modules.
58+
4959
import pandas as pd
5060
from sklearn.naive_bayes import GaussianNB
5161

@@ -56,10 +66,15 @@ This article shows how to use **Create Python Model** with a simple pipeline. He
5666
self.feature_column_names = list()
5767

5868
def train(self, df_train, df_label):
69+
# self.feature_column_names records the column names used for training.
70+
# It is recommended to set this attribute before training so that the
71+
# feature columns used in predict and train methods have the same names.
5972
self.feature_column_names = df_train.columns.tolist()
6073
self.model.fit(df_train, df_label)
6174

6275
def predict(self, df):
76+
# The feature columns used for prediction MUST have the same names as the ones for training.
77+
# The name of score column ("Scored Labels" in this case) MUST be different from any other columns in input data.
6378
return pd.DataFrame(
6479
{'Scored Labels': self.model.predict(df[self.feature_column_names]),
6580
'probabilities': self.model.predict_proba(df[self.feature_column_names])[:, 1]}

0 commit comments

Comments
 (0)