Skip to content

Commit 6834e29

Browse files
author
Larry
committed
incorporating feedback
1 parent 9332035 commit 6834e29

File tree

1 file changed

+37
-32
lines changed

1 file changed

+37
-32
lines changed

articles/machine-learning/tutorial-convert-ml-experiment-to-production.md

Lines changed: 37 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ ms.topic: tutorial
99
ms.date: 02/10/2020
1010
---
1111

12-
# Tutorial: Convert ML Experimental Code to Production Code
12+
# Tutorial: Convert ML experimental code to production code
1313

1414
A machine learning project requires experimentation where hypotheses are tested with agile tools like Jupyter Notebook using real datasets. Once the model is ready for production, the model code should be placed in a production code repository. In some cases, the model code must be converted to Python scripts to be placed in the production code repository. This tutorial covers a recommended approach on how to export experimentation code to Python scripts.
1515

1616
In this tutorial, you learn how to:
1717

1818
> [!div class="checklist"]
1919
> * Clean nonessential code
20-
> * Refactor Jupyter notebook code into functions
20+
> * Refactor Jupyter Notebook code into functions
2121
> * Create Python scripts for related tasks
2222
> * Create unit tests
2323
@@ -29,7 +29,7 @@ and use the `experimentation/Diabetes Ridge Regression Training.ipynb` and `expe
2929

3030
## Remove all nonessential code
3131

32-
Some code written during experimentation is only intended for exploratory purposes. Therefore, the first step to convert experimental code into production code is to remove this nonessential code. Removing nonessential code will also make the code more maintainable. In this section, you'll remove code from the Diabetes Ridge Regression Training Notebook. The statements printing the shape of `X` and `y` and the cell calling `features.describe` are just for data exploration and can be removed. After removing nonessential code, `experimentation/Diabetes Ridge Regression Training.ipynb` should look like the following code without markdown:
32+
Some code written during experimentation is only intended for exploratory purposes. Therefore, the first step to convert experimental code into production code is to remove this nonessential code. Removing nonessential code will also make the code more maintainable. In this section, you'll remove code from the Diabetes Ridge Regression Training notebook. The statements printing the shape of `X` and `y` and the cell calling `features.describe` are just for data exploration and can be removed. After removing nonessential code, `experimentation/Diabetes Ridge Regression Training.ipynb` should look like the following code without markdown:
3333

3434
```python
3535
from sklearn.datasets import load_diabetes
@@ -60,13 +60,14 @@ joblib.dump(value=reg, filename=model_name)
6060
## Refactor code into functions
6161

6262
Second, the Jupyter code needs to be refactored into functions. Refactoring code into functions makes unit testing easier and makes the code more maintainable. In this section, you'll refactor:
63-
- The Diabetes Ridge Regression Training Notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
64-
- The Diabetes Ridge Regression Scoring Notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
63+
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
64+
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
6565

6666
### Refactor Diabetes Ridge Regression Training notebook into functions
6767
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
68-
- Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
69-
- Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function
68+
69+
1. Create a function called `train_model`, which takes the parameters `data` and `alpha` and returns a model.
70+
1. Copy the code under the headings “Train Model on Training Set” and “Validate Model on Validation Set” into the `train_model` function.
7071

7172
The `train_model` function should look like the following code:
7273

@@ -88,9 +89,10 @@ reg = train_model(data, alpha)
8889
The previous statement calls the `train_model` function passing the `data` and `alpha` parameters and returns the model
8990

9091
In `experimentation/Diabetes Ridge Regression Training.ipynb`, complete the following steps:
91-
- Create a new function called `main`, which takes no parameters and returns nothing.
92-
- Copy the code under the headings “Load Data”, “Split Data into Training and Validation Sets”, and “Save Model” into the `main` function
93-
- Copy the newly created call to `train_model` into the `main` function
92+
93+
1. Create a new function called `main`, which takes no parameters and returns nothing.
94+
1. Copy the code under the headings “Load Data”, “Split Data into Training and Validation Sets”, and “Save Model” into the `main` function.
95+
1. Copy the newly created call to `train_model` into the `main` function.
9496

9597
The `main` function should look like the following code:
9698

@@ -157,8 +159,9 @@ main()
157159

158160
### Refactor Diabetes Ridge Regression Scoring notebook into functions
159161
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
160-
- Create a new function called `init`, which takes no parameters and return nothing
161-
- Copy the code under the “Load Model” heading into the `init` function
162+
163+
1. Create a new function called `init`, which takes no parameters and return nothing.
164+
1. Copy the code under the “Load Model” heading into the `init` function.
162165

163166
The `init` function should look like the following code:
164167

@@ -176,23 +179,25 @@ init()
176179
```
177180

178181
In `experimentation/Diabetes Ridge Regression Scoring.ipynb`, complete the following steps:
179-
- Create a new function called `run`, which takes raw_data and request_headers as parameters and returns a dictionary of results as follows:
180182

181-
```python
182-
{"result": result.tolist()}
183-
```
184-
- Copy the code under the “Prepare Data” and “Score Data” headings into the `run` function
183+
1. Create a new function called `run`, which takes raw_data and request_headers as parameters and returns a dictionary of results as follows:
185184

186-
The `run` function should look like the following code(Remember to remove the statements that set the variables `raw_data` and `request_headers`, which will be used later when the `run` function is called):
185+
```python
186+
{"result": result.tolist()}
187+
```
187188

188-
```python
189-
def run(raw_data, request_headers):
190-
data = json.loads(raw_data)["data"]
191-
data = numpy.array(data)
192-
result = model.predict(data)
189+
1. Copy the code under the “Prepare Data” and “Score Data” headings into the `run` function.
193190

194-
return {"result": result.tolist()}
195-
```
191+
The `run` function should look like the following code (Remember to remove the statements that set the variables `raw_data` and `request_headers`, which will be used later when the `run` function is called):
192+
193+
```python
194+
def run(raw_data, request_headers):
195+
data = json.loads(raw_data)["data"]
196+
data = numpy.array(data)
197+
result = model.predict(data)
198+
199+
return {"result": result.tolist()}
200+
```
196201

197202
Once the `run` function has been created, replace all the code under the “Prepare Data” and “Score Data” headings with the following code:
198203

@@ -233,10 +238,10 @@ print("Test result: ", prediction)
233238

234239
## Combine related functions in Python files
235240
Third, related functions need to be merged into Python files to better help code reuse. In this section, you'll be creating Python files for the following notebooks:
236-
- The Diabetes Ridge Regression Training Notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
237-
- The Diabetes Ridge Regression Scoring Notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
241+
- The Diabetes Ridge Regression Training notebook(`experimentation/Diabetes Ridge Regression Training.ipynb`)
242+
- The Diabetes Ridge Regression Scoring notebook(`experimentation/Diabetes Ridge Regression Scoring.ipynb`)
238243

239-
### Create Python file for the Diabetes Ridge Regression Training Notebook
244+
### Create Python file for the Diabetes Ridge Regression Training notebook
240245
Convert your notebook to an executable script by running the following statement in a command prompt, which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Training.ipynb`:
241246

242247
```
@@ -281,7 +286,7 @@ main()
281286

282287
The `train.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository supports command-line arguments (namely `build_id`, `model_name`, and `alpha`). Support for command-line arguments can be added to your `train.py` file to support dynamic model names and `alpha` values, but it's not necessary for the code to execute successfully.
283288

284-
### Create Python file for the Diabetes Ridge Regression Scoring Notebook
289+
### Create Python file for the Diabetes Ridge Regression Scoring notebook
285290
Covert your notebook to an executable script by running the following statement in a command prompt that which uses the nbconvert package and the path of `experimentation/Diabetes Ridge Regression Scoring.ipynb`:
286291

287292
```
@@ -382,16 +387,16 @@ Following the getting started guide is necessary to have the supporting infrastr
382387
### Replace Training Code
383388
Replacing the code used to train the model and removing or replacing corresponding unit tests is required for the solution to function with your own code. Follow these steps specifically:
384389

385-
- Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
386-
- Remove or replace training unit tests found in `tests/unit/code_test.py`
390+
1. Replace `diabetes_regression\training\train.py`. This script trains your model locally or on the Azure ML compute.
391+
1. Remove or replace training unit tests found in `tests/unit/code_test.py`
387392

388393
### Replace Score Code
389394
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps. If you want to keep scoring, replace `diabetes_regression/scoring/score.py`.
390395

391396
### Update Evaluation Code
392397
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are stopped. To keep evaluation, replace all instances of `mse` in `diabetes_regression/evaluate/evaluate_model.py` with the metric that you want.
393398

394-
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines\diabetes_regression-variables` to false.
399+
To get rid of evaluation, set the DevOps pipeline variable `RUN_EVALUATION` in `.pipelines\diabetes_regression-variables` to `false`.
395400
396401
## Next steps
397402

0 commit comments

Comments
 (0)