Skip to content

Commit 0b506c8

Browse files
authored
Merge pull request #106420 from tcare/ML-to-prod-update
Update the ML to Production article
2 parents bccb44a + 32456c0 commit 0b506c8

File tree

1 file changed

+42
-20
lines changed

1 file changed

+42
-20
lines changed

articles/machine-learning/tutorial-convert-ml-experiment-to-production.md

Lines changed: 42 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,20 @@ ms.date: 02/10/2020
1111

1212
# Tutorial: Convert ML experimental code to production code
1313

14-
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
14+
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
1515

1616
In this tutorial, you learn how to:
1717

1818
> [!div class="checklist"]
19+
>
1920
> * Clean nonessential code
2021
> * Refactor Jupyter Notebook code into functions
2122
> * Create Python scripts for related tasks
2223
> * Create unit tests
2324
2425
## Prerequisites
2526

26-
- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate)
27+
- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate)
2728
and use the `experimentation/Diabetes Ridge Regression Training.ipynb` and `experimentation/Diabetes Ridge Regression Scoring.ipynb` notebooks. These notebooks are used as an example of converting from experimentation to production.
2829
- Install nbconvert. Follow only the installation instructions under section __Installing nbconvert__ on the [Installation](https://nbconvert.readthedocs.io/en/latest/install.html) page.
2930

@@ -37,7 +38,7 @@ from sklearn.linear_model import Ridge
3738
from sklearn.metrics import mean_squared_error
3839
from sklearn.model_selection import train_test_split
3940
import joblib
40-
41+
4142
X, y = load_diabetes(return_X_y=True)
4243

4344
X_train, X_test, y_train, y_test = train_test_split(
@@ -60,13 +61,15 @@ joblib.dump(value=reg, filename=model_name)
6061
## Refactor code into functions
6162

6263
Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:
64+
6365
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
6466
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
6567

6668
### Refactor Diabetes Ridge Regression Training notebook into functions
69+
6770
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
6871

69-
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
72+
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
7073
1. Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function.
7174

7275
The `train_model` function should look like the following code:
@@ -102,7 +105,7 @@ def main():
102105

103106
model_name = "sklearn_regression_model.pkl"
104107
alpha = 0.5
105-
108+
106109
X, y = load_diabetes(return_X_y=True)
107110

108111
X_train, X_test, y_train, y_test = train_test_split(
@@ -143,7 +146,7 @@ def main():
143146

144147
model_name = "sklearn_regression_model.pkl"
145148
alpha = 0.5
146-
149+
147150
X, y = load_diabetes(return_X_y=True)
148151

149152
X_train, X_test, y_train, y_test = train_test_split(
@@ -159,6 +162,7 @@ main()
159162
```
160163

161164
### Refactor Diabetes Ridge Regression Scoring notebook into functions
165+
162166
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
163167

164168
1. Create a new function called `init`, which takes no parameters and return nothing.
@@ -208,6 +212,7 @@ request_header = {}
208212
prediction = run(raw_data, request_header)
209213
print("Test result: ", prediction)
210214
```
215+
211216
The previous code sets variables `raw_data` and `request_header`, calls the `run` function with `raw_data` and `request_header`, and prints the predictions.
212217

213218
After refactoring, `experimentation/Diabetes Ridge Regression Scoring.ipynb` should look like the following code without the markdown:
@@ -238,11 +243,14 @@ print("Test result: ", prediction)
238243
```
239244

240245
## Combine related functions in Python files
246+
241247
Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:
248+
242249
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
243250
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
244251

245252
### Create Python file for the Diabetes Ridge Regression Training notebook
253+
246254
Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Training.ipynb`:
247255

248256
```
@@ -270,7 +278,7 @@ def train_model(data, alpha):
270278
def main():
271279
model_name = "sklearn_regression_model.pkl"
272280
alpha = 0.5
273-
281+
274282
X, y = load_diabetes(return_X_y=True)
275283

276284
X_train, X_test, y_train, y_test = train_test_split(
@@ -288,6 +296,7 @@ main()
288296
The `train.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository supports command-line arguments (namely `build_id`, `model_name`, and `alpha`). Support for command-line arguments can be added to your `train.py` file to support dynamic model names and `alpha` values, but it's not necessary for the code to execute successfully.
289297

290298
### Create Python file for the Diabetes Ridge Regression Scoring notebook
299+
291300
Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Scoring.ipynb`:
292301

293302
```
@@ -340,11 +349,13 @@ def init():
340349
```
341350

342351
## Create unit tests for each Python file
352+
343353
Fourth, unit tests need to be created for each Python file, which makes code more robust and easier to maintain. In this section, you'll be creating a unit test for one of the functions in `train.py`.
344354

345-
`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
355+
`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
346356

347357
A unit test usually contains three main actions:
358+
348359
- Arrange object - creating and setting up necessary objects
349360
- Act on an object
350361
- Assert what is expected
@@ -375,29 +386,40 @@ class TestTrain:
375386
```
376387

377388
## Use your own model with MLOpsPython code template
378-
If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository. According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:
379389

380-
1. Follow the Getting Started Guide
381-
2. Replace the Training Code
382-
3. Replace the Score Code
383-
4. Update the Evaluation Code
390+
If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository. According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:
391+
392+
1. Follow the MLOpsPython [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide
393+
2. Follow the MLOpsPython [bootstrap instructions](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) to create your project starting point
394+
3. Replace the Training Code
395+
4. Replace the Score Code
396+
5. Update the Evaluation Code
384397

385398
### Follow the Getting Started Guide
386-
Following the getting started guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython. We recommended deploying the MLOpsPython code as-is before putting in your own code to ensure the structure and pipeline is working properly. It's also useful to familiarize yourself with the code structure of the repository.
399+
Following the [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython.
400+
401+
### Follow the Bootstrap Instructions
402+
403+
The [Bootstrap from MLOpsPython repository](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) guide will help you to quickly prepare the repository for your project.
404+
405+
**Note:** Since the bootstrap script will rename the diabetes_regression folder to the project name of your choice, we'll refer to your project as `[project name]` when paths are involved.
387406

388407
### Replace Training Code
389-
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
390408

391-
1. Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
392-
1. Remove or replace training unit tests found in `tests/unit/code_test.py`
409+
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
410+
411+
1. Replace `[project name]/training/train.py`. This script trains your model locally or on the Azure ML compute.
412+
1. Remove or replace training unit tests found in `[project name]/training/test_train.py`
393413

394414
### Replace Score Code
395-
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `diabetes_regression/scoring/score.py`.
415+
416+
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `[project name]/scoring/score.py`.
396417

397418
### Update Evaluation Code
398-
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are stopped. To keep evaluation, replace all instances of `mse` in `diabetes_regression/evaluate/evaluate_model.py` with the metric that you want.
399419

400-
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines\diabetes_regression-variables` to `false`.
420+
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are canceled. To keep evaluation, replace all instances of `mse` in `[project name]/evaluate/evaluate_model.py` with the metric that you want.
421+
422+
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines/[project name]-variables-template.yml` to `false`.
401423

402424
## Next steps
403425

0 commit comments

Comments
 (0)