Merge pull request #106420 from tcare/ML-to-prod-update

PRMerger12 · web-flow · commit 0b506c886967 · 2020-03-04T10:22:49.000-08:00
Update the ML to Production article
diff --git a/articles/machine-learning/tutorial-convert-ml-experiment-to-production.md b/articles/machine-learning/tutorial-convert-ml-experiment-to-production.md
@@ -11,19 +11,20 @@ ms.date: 02/10/2020
 
 # Tutorial: Convert ML experimental code to production code
 
-A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.  
+A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
 
 In this tutorial, you learn how to:
 
 > [!div class="checklist"]
+>
 > * Clean nonessential code
 > * Refactor Jupyter Notebook code into functions
 > * Create Python scripts for related tasks
 > * Create unit tests
 
 ## Prerequisites
 
-- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate) 
+- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate)
 and use the `experimentation/Diabetes Ridge Regression Training.ipynb` and `experimentation/Diabetes Ridge Regression Scoring.ipynb` notebooks. These notebooks are used as an example of converting from experimentation to production.
 - Install nbconvert. Follow only the installation instructions under section __Installing nbconvert__ on the [Installation](https://nbconvert.readthedocs.io/en/latest/install.html) page.
 
@@ -37,7 +38,7 @@ from sklearn.linear_model import Ridge
 from sklearn.metrics import mean_squared_error
 from sklearn.model_selection import train_test_split
 import joblib
- 
+
 X, y = load_diabetes(return_X_y=True)
 
 X_train, X_test, y_train, y_test = train_test_split(
@@ -60,13 +61,15 @@ joblib.dump(value=reg, filename=model_name)
 ## Refactor code into functions
 
 Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:
+
 - The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
 - The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
 
 ### Refactor Diabetes Ridge Regression Training notebook into functions
+
 In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
 
-1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model. 
+1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
 1. Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function.
 
 The `train_model` function should look like the following code:
@@ -102,7 +105,7 @@ def main():
 
     model_name = "sklearn_regression_model.pkl"
     alpha = 0.5
-    
+
     X, y = load_diabetes(return_X_y=True)
 
     X_train, X_test, y_train, y_test = train_test_split(
@@ -143,7 +146,7 @@ def main():
 
     model_name = "sklearn_regression_model.pkl"
     alpha = 0.5
-    
+
     X, y = load_diabetes(return_X_y=True)
 
     X_train, X_test, y_train, y_test = train_test_split(
@@ -159,6 +162,7 @@ main()
 ```
 
 ### Refactor Diabetes Ridge Regression Scoring notebook into functions
+
 In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
 
 1. Create a new function called `init`, which takes no parameters and return nothing.
@@ -208,6 +212,7 @@ request_header = {}
 prediction = run(raw_data, request_header)
 print("Test result: ", prediction)
 ```
+
 The previous code sets variables `raw_data` and `request_header`, calls the `run` function with `raw_data` and `request_header`, and prints the predictions.
 
 After refactoring, `experimentation/Diabetes Ridge Regression Scoring.ipynb` should look like the following code without the markdown:
@@ -238,11 +243,14 @@ print("Test result: ", prediction)
 ```
 
 ## Combine related functions in Python files
+
 Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:
+
 - The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
 - The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
 
 ### Create Python file for the Diabetes Ridge Regression Training notebook
+
 Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Training.ipynb`:
 
 ```
@@ -270,7 +278,7 @@ def train_model(data, alpha):
 def main():
     model_name = "sklearn_regression_model.pkl"
     alpha = 0.5
-    
+
     X, y = load_diabetes(return_X_y=True)
 
     X_train, X_test, y_train, y_test = train_test_split(
@@ -288,6 +296,7 @@ main()
 The `train.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository supports command-line arguments (namely `build_id`, `model_name`, and `alpha`). Support for command-line arguments can be added to your `train.py` file to support dynamic model names and `alpha` values, but it's not necessary for the code to execute successfully.
 
 ### Create Python file for the Diabetes Ridge Regression Scoring notebook
+
 Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Scoring.ipynb`:
 
 ```
@@ -340,11 +349,13 @@ def init():
 ```
 
 ## Create unit tests for each Python file
+
 Fourth, unit tests need to be created for each Python file, which makes code more robust and easier to maintain. In this section, you'll be creating a unit test for one of the functions in `train.py`.
 
-`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial.  Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
+`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
 
 A unit test usually contains three main actions:
+
 - Arrange object - creating and setting up necessary objects
 - Act on an object
 - Assert what is expected
@@ -375,29 +386,40 @@ class TestTrain:
 ```
 
 ## Use your own model with MLOpsPython code template
-If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository.  According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:  
 
-1.	Follow the Getting Started Guide
-2.	Replace the Training Code
-3.	Replace the Score Code
-4.	Update the Evaluation Code
+If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository.  According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:
+
+1. Follow the MLOpsPython [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide
+2. Follow the MLOpsPython [bootstrap instructions](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) to create your project starting point
+3. Replace the Training Code
+4. Replace the Score Code
+5. Update the Evaluation Code
 
 ### Follow the Getting Started Guide
-Following the getting started guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython.  We recommended deploying the MLOpsPython code as-is before putting in your own code to ensure the structure and pipeline is working properly.  It's also useful to familiarize yourself with the code structure of the repository.
+Following the [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython.
+
+### Follow the Bootstrap Instructions
+
+The [Bootstrap from MLOpsPython repository](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) guide will help you to quickly prepare the repository for your project.
+
+**Note:** Since the bootstrap script will rename the diabetes_regression folder to the project name of your choice, we'll refer to your project as `[project name]` when paths are involved.
 
 ### Replace Training Code
-Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code.  Follow these steps specifically:
 
-1. Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
-1. Remove or replace training unit tests found in `tests/unit/code_test.py`
+Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
+
+1. Replace `[project name]/training/train.py`. This script trains your model locally or on the Azure ML compute.
+1. Remove or replace training unit tests found in `[project name]/training/test_train.py`
 
 ### Replace Score Code
-For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps.  If you want to keep scoring, replace `diabetes_regression/scoring/score.py`.
+
+For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `[project name]/scoring/score.py`.
 
 ### Update Evaluation Code
-The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are stopped. To keep evaluation, replace all instances of `mse` in `diabetes_regression/evaluate/evaluate_model.py` with the metric that you want. 
 
-To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines\diabetes_regression-variables` to `false`.
+The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are canceled. To keep evaluation, replace all instances of `mse` in `[project name]/evaluate/evaluate_model.py` with the metric that you want.
+
+To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines/[project name]-variables-template.yml` to `false`.
 
 ## Next steps