You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-convert-ml-experiment-to-production.md
+37-32Lines changed: 37 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,15 +9,15 @@ ms.topic: tutorial
9
9
ms.date: 02/10/2020
10
10
---
11
11
12
-
# Tutorial: Convert ML Experimental Code to Production Code
12
+
# Tutorial: Convert ML experimental code to production code
13
13
14
14
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
15
15
16
16
In this tutorial, you learn how to:
17
17
18
18
> [!div class="checklist"]
19
19
> * Clean nonessential code
20
-
> * Refactor Jupyter notebook code into functions
20
+
> * Refactor Jupyter Notebook code into functions
21
21
> * Create Python scripts for related tasks
22
22
> * Create unit tests
23
23
@@ -29,7 +29,7 @@ and use the `experimentation/Diabetes Ridge Regression Training.ipynb` and `expe
29
29
30
30
## Remove all nonessential code
31
31
32
-
Some code written during experimentation is only intended for exploratory purposes. Therefore, the first step to convert experimental code into production code is to remove this nonessential code. Removing nonessential code will also make the code more maintainable. In this section, you'll remove code from the Diabetes Ridge Regression Training Notebook. The statements printing the shape of `X` and `y` and the cell calling `features.describe` are just for data exploration and can be removed. After removing nonessential code, `experimentation/Diabetes Ridge Regression Training.ipynb` should look like the following code without markdown:
32
+
Some code written during experimentation is only intended for exploratory purposes. Therefore, the first step to convert experimental code into production code is to remove this nonessential code. Removing nonessential code will also make the code more maintainable. In this section, you'll remove code from the Diabetes Ridge Regression Training notebook. The statements printing the shape of `X` and `y` and the cell calling `features.describe` are just for data exploration and can be removed. After removing nonessential code, `experimentation/Diabetes Ridge Regression Training.ipynb` should look like the following code without markdown:
Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:
63
-
- The Diabetes Ridge Regression Training Notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
64
-
- The Diabetes Ridge Regression Scoring Notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
63
+
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
64
+
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
65
65
66
66
### Refactor Diabetes Ridge Regression Training notebook into functions
67
67
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
68
-
- Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
69
-
- Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function
68
+
69
+
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
70
+
1. Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function.
70
71
71
72
The `train_model` function should look like the following code:
72
73
@@ -88,9 +89,10 @@ reg = train_model(data, alpha)
88
89
The previous statement calls the `train_model` function passing the `data` and `alpha` parameters and returns the model
89
90
90
91
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
91
-
- Create a new function called `main`, which takes no parameters and returns nothing.
92
-
- Copy the code under the headings “Load Data”, “Split Data into Training and Validation Sets”, and “Save Model” into the `main` function
93
-
- Copy the newly created call to `train_model` into the `main` function
92
+
93
+
1. Create a new function called `main`, which takes no parameters and returns nothing.
94
+
1. Copy the code under the headings “Load Data”, “Split Data into Training and Validation Sets”, and “Save Model” into the `main` function.
95
+
1. Copy the newly created call to `train_model` into the `main` function.
94
96
95
97
The `main` function should look like the following code:
96
98
@@ -157,8 +159,9 @@ main()
157
159
158
160
### Refactor Diabetes Ridge Regression Scoring notebook into functions
159
161
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
160
-
- Create a new function called `init`, which takes no parameters and return nothing
161
-
- Copy the code under the “Load Model” heading into the `init` function
162
+
163
+
1. Create a new function called `init`, which takes no parameters and return nothing.
164
+
1. Copy the code under the “Load Model” heading into the `init` function.
162
165
163
166
The `init` function should look like the following code:
164
167
@@ -176,23 +179,25 @@ init()
176
179
```
177
180
178
181
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
179
-
- Create a new function called `run`, which takes raw_data and request_headers as parameters and returns a dictionary of results as follows:
180
182
181
-
```python
182
-
{"result": result.tolist()}
183
-
```
184
-
- Copy the code under the “Prepare Data” and “Score Data” headings into the `run` function
183
+
1. Create a new function called `run`, which takes raw_data and request_headers as parameters and returns a dictionary of results as follows:
185
184
186
-
The `run` function should look like the following code(Remember to remove the statements that set the variables `raw_data` and `request_headers`, which will be used later when the `run` function is called):
185
+
```python
186
+
{"result": result.tolist()}
187
+
```
187
188
188
-
```python
189
-
defrun(raw_data, request_headers):
190
-
data = json.loads(raw_data)["data"]
191
-
data = numpy.array(data)
192
-
result = model.predict(data)
189
+
1. Copy the code under the “Prepare Data” and “Score Data” headings into the `run` function.
193
190
194
-
return {"result": result.tolist()}
195
-
```
191
+
The `run` function should look like the following code (Remember to remove the statements that set the variables `raw_data`and`request_headers`, which will be used later when the `run` function is called):
192
+
193
+
```python
194
+
defrun(raw_data, request_headers):
195
+
data = json.loads(raw_data)["data"]
196
+
data = numpy.array(data)
197
+
result = model.predict(data)
198
+
199
+
return {"result": result.tolist()}
200
+
```
196
201
197
202
Once the `run` function has been created, replace all the code under the “Prepare Data” and “Score Data” headings with the following code:
Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:
236
-
- The Diabetes Ridge Regression Training Notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
237
-
- The Diabetes Ridge Regression Scoring Notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
241
+
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
242
+
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
238
243
239
-
### Create Python file for the Diabetes Ridge Regression Training Notebook
244
+
### Create Python file for the Diabetes Ridge Regression Training notebook
240
245
Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Training.ipynb`:
241
246
242
247
```
@@ -281,7 +286,7 @@ main()
281
286
282
287
The `train.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository supports command-line arguments (namely `build_id`, `model_name`, and `alpha`). Support for command-line arguments can be added to your `train.py` file to support dynamic model names and `alpha` values, but it's not necessary for the code to execute successfully.
283
288
284
-
### Create Python file for the Diabetes Ridge Regression Scoring Notebook
289
+
### Create Python file for the Diabetes Ridge Regression Scoring notebook
285
290
Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Scoring.ipynb`:
286
291
287
292
```
@@ -382,16 +387,16 @@ Following the getting started guide is necessary to have the supporting infrastr
382
387
### Replace Training Code
383
388
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
384
389
385
-
- Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
386
-
- Remove or replace training unit tests found in `tests/unit/code_test.py`
390
+
1. Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
391
+
1. Remove or replace training unit tests found in `tests/unit/code_test.py`
387
392
388
393
### Replace Score Code
389
394
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `diabetes_regression/scoring/score.py`.
390
395
391
396
### Update Evaluation Code
392
397
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are stopped. To keep evaluation, replace all instances of `mse`in`diabetes_regression/evaluate/evaluate_model.py`with the metric that you want.
393
398
394
-
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION`in`.pipelines\diabetes_regression-variables` to false.
399
+
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION`in`.pipelines\diabetes_regression-variables` to `false`.
0 commit comments