You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-convert-ml-experiment-to-production.md
+42-20Lines changed: 42 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,19 +11,20 @@ ms.date: 02/10/2020
11
11
12
12
# Tutorial: Convert ML experimental code to production code
13
13
14
-
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
14
+
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
15
15
16
16
In this tutorial, you learn how to:
17
17
18
18
> [!div class="checklist"]
19
+
>
19
20
> * Clean nonessential code
20
21
> * Refactor Jupyter Notebook code into functions
21
22
> * Create Python scripts for related tasks
22
23
> * Create unit tests
23
24
24
25
## Prerequisites
25
26
26
-
- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate)
27
+
- Generate the [MLOpsPython template](https://github.com/microsoft/MLOpsPython/generate)
27
28
and use the `experimentation/Diabetes Ridge Regression Training.ipynb` and `experimentation/Diabetes Ridge Regression Scoring.ipynb` notebooks. These notebooks are used as an example of converting from experimentation to production.
28
29
- Install nbconvert. Follow only the installation instructions under section __Installing nbconvert__ on the [Installation](https://nbconvert.readthedocs.io/en/latest/install.html) page.
29
30
@@ -37,7 +38,7 @@ from sklearn.linear_model import Ridge
37
38
from sklearn.metrics import mean_squared_error
38
39
from sklearn.model_selection import train_test_split
Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:
64
+
63
65
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
64
66
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
65
67
66
68
### Refactor Diabetes Ridge Regression Training notebook into functions
69
+
67
70
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
68
71
69
-
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
72
+
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
70
73
1. Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function.
71
74
72
75
The `train_model` function should look like the following code:
### Refactor Diabetes Ridge Regression Scoring notebook into functions
165
+
162
166
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
163
167
164
168
1. Create a new function called `init`, which takes no parameters and return nothing.
@@ -208,6 +212,7 @@ request_header = {}
208
212
prediction = run(raw_data, request_header)
209
213
print("Test result: ", prediction)
210
214
```
215
+
211
216
The previous code sets variables `raw_data` and `request_header`, calls the `run` function with `raw_data` and `request_header`, and prints the predictions.
212
217
213
218
After refactoring, `experimentation/Diabetes Ridge Regression Scoring.ipynb` should look like the following code without the markdown:
Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:
248
+
242
249
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
243
250
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
244
251
245
252
### Create Python file for the Diabetes Ridge Regression Training notebook
253
+
246
254
Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Training.ipynb`:
The `train.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository supports command-line arguments (namely `build_id`, `model_name`, and `alpha`). Support for command-line arguments can be added to your `train.py` file to support dynamic model names and `alpha` values, but it's not necessary for the code to execute successfully.
289
297
290
298
### Create Python file for the Diabetes Ridge Regression Scoring notebook
299
+
291
300
Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Scoring.ipynb`:
292
301
293
302
```
@@ -340,11 +349,13 @@ def init():
340
349
```
341
350
342
351
## Create unit tests for each Python file
352
+
343
353
Fourth, unit tests need to be created for each Python file, which makes code more robust and easier to maintain. In this section, you'll be creating a unit test for one of the functions in `train.py`.
344
354
345
-
`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
355
+
`train.py` contains two functions: `train_model` and `main`. Each function needs a unit test, but we'll only create a single unit test for the `train_model` function using the Pytest framework in this tutorial. Pytest isn't the only Python unit testing framework, but it's one of the most commonly used. For more information, visit [Pytest](https://pytest.org).
346
356
347
357
A unit test usually contains three main actions:
358
+
348
359
- Arrange object - creating and setting up necessary objects
349
360
- Act on an object
350
361
- Assert what is expected
@@ -375,29 +386,40 @@ class TestTrain:
375
386
```
376
387
377
388
## Use your own model with MLOpsPython code template
378
-
If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository. According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:
379
389
380
-
1. Follow the Getting Started Guide
381
-
2. Replace the Training Code
382
-
3. Replace the Score Code
383
-
4. Update the Evaluation Code
390
+
If you have been following the steps in this guide, you'll have a set of scripts that correlate to the train/score/test scripts available in the MLOpsPython repository. According to the structure mentioned above, the following steps will walk through what is needed to use these files for your own machine learning project:
391
+
392
+
1. Follow the MLOpsPython [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide
393
+
2. Follow the MLOpsPython [bootstrap instructions](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) to create your project starting point
394
+
3. Replace the Training Code
395
+
4. Replace the Score Code
396
+
5. Update the Evaluation Code
384
397
385
398
### Follow the Getting Started Guide
386
-
Following the getting started guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython. We recommended deploying the MLOpsPython code as-is before putting in your own code to ensure the structure and pipeline is working properly. It's also useful to familiarize yourself with the code structure of the repository.
399
+
Following the [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide is necessary to have the supporting infrastructure and pipelines to execute MLOpsPython.
400
+
401
+
### Follow the Bootstrap Instructions
402
+
403
+
The [Bootstrap from MLOpsPython repository](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) guide will help you to quickly prepare the repository for your project.
404
+
405
+
**Note:** Since the bootstrap script will rename the diabetes_regression folder to the project name of your choice, we'll refer to your project as `[project name]` when paths are involved.
387
406
388
407
### Replace Training Code
389
-
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
390
408
391
-
1. Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
392
-
1. Remove or replace training unit tests found in `tests/unit/code_test.py`
409
+
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
410
+
411
+
1. Replace `[project name]/training/train.py`. This script trains your model locally or on the Azure ML compute.
412
+
1. Remove or replace training unit tests found in `[project name]/training/test_train.py`
393
413
394
414
### Replace Score Code
395
-
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `diabetes_regression/scoring/score.py`.
415
+
416
+
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `[project name]/scoring/score.py`.
396
417
397
418
### Update Evaluation Code
398
-
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are stopped. To keep evaluation, replace all instances of `mse` in `diabetes_regression/evaluate/evaluate_model.py` with the metric that you want.
399
419
400
-
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines\diabetes_regression-variables` to `false`.
420
+
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are canceled. To keep evaluation, replace all instances of `mse` in `[project name]/evaluate/evaluate_model.py` with the metric that you want.
421
+
422
+
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines/[project name]-variables-template.yml` to `false`.
0 commit comments