Merge pull request #111100 from Bowen-Guo/bowguo/create_python_model

PRMerger7 · web-flow · commit 2ddf57ebc77e · 2020-04-20T05:07:43.000-07:00
Update doc of module Create Python Model to reduce user error
diff --git a/articles/machine-learning/algorithm-module-reference/create-python-model.md b/articles/machine-learning/algorithm-module-reference/create-python-model.md
@@ -26,14 +26,22 @@ After you create the model, you can use [Train Model](train-model.md) to train t
 ## Configure the module
 
 Use of this module requires intermediate or expert knowledge of Python. The module supports use of any learner that's included in the Python packages already installed in Azure Machine Learning. See the preinstalled Python package list in [Execute Python Script](execute-python-script.md).
-  
 
+> [!NOTE]
+> Please be very careful when writing your script and makes sure there is no syntax error, such as using a un-declared object or a un-imported module.
+
+> [!NOTE]
+Also pay extra attentions to the pre-installed modules list in [Execute Python Script](execute-python-script.md). Only import pre-installed modules. Please do not install extra packages such as "pip install xgboost" in this script, otherwise errors will be raised when reading models in down-stream modules.
+  
 This article shows how to use **Create Python Model** with a simple pipeline. Here's a diagram of the pipeline:
 
 ![Diagram of Create Python Model](./media/module/create-python-model.png)
 
 1. Select **Create Python Model**, and edit the script to implement your modeling or data management process. You can base the model on any learner that's included in a Python package in the Azure Machine Learning environment.
 
+> [!NOTE]
+> Please pay extra attention to the comments in sample code of the script and make sure your script strictly follows the requirement, including the class name, methods as well as method signature. Violation will lead to exceptions. 
+
    The following sample code of the two-class Naive Bayes classifier uses the popular *sklearn* package:
 
    ```Python
@@ -45,7 +53,9 @@ This article shows how to use **Create Python Model** with a simple pipeline. He
        # predict: which generates prediction result, the input argument and the prediction result MUST be pandas DataFrame.
    # The signatures (method names and argument names) of all these methods MUST be exactly the same as the following example.
 
-
+   # Please do not install extra packages such as "pip install xgboost" in this script,
+   # otherwise errors will be raised when reading models in down-stream modules.
+   
    import pandas as pd
    from sklearn.naive_bayes import GaussianNB
 
@@ -56,10 +66,15 @@ This article shows how to use **Create Python Model** with a simple pipeline. He
            self.feature_column_names = list()
 
        def train(self, df_train, df_label):
+           # self.feature_column_names records the column names used for training.
+           # It is recommended to set this attribute before training so that the
+           # feature columns used in predict and train methods have the same names.
            self.feature_column_names = df_train.columns.tolist()
            self.model.fit(df_train, df_label)
 
        def predict(self, df):
+           # The feature columns used for prediction MUST have the same names as the ones for training.
+           # The name of score column ("Scored Labels" in this case) MUST be different from any other columns in input data.
            return pd.DataFrame(
                {'Scored Labels': self.model.predict(df[self.feature_column_names]), 
                 'probabilities': self.model.predict_proba(df[self.feature_column_names])[:, 1]}