Skip to content

Commit af2b772

Browse files
authored
Moving code integration guide into this repo (#243)
* Moving code integration guide into this repo * dtzar feedback * add recommendation
1 parent cace90d commit af2b772

File tree

5 files changed

+96
-25
lines changed

5 files changed

+96
-25
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ The build pipelines include DevOps tasks for data sanity tests, unit tests, mode
3434

3535
## Getting Started
3636

37-
To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc
37+
To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc. Then optionally follow the guide for [integrating your own code](docs/custom_model.md) with this repository template.
3838

3939
### Repo Details
4040

bootstrap/README.md

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,18 @@
11
# Bootstrap from MLOpsPython repository
22

3-
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstrapping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
3+
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project.
44

5-
## Generating the project structure
5+
Bootstrapping will prepare a directory structure for your project which includes:
66

7-
To bootstrap from the existing MLOpsPython repository clone this repository, ensure Python is installed locally, and run bootstrap.py script as below
7+
* renaming files and folders from the base project name `diabetes` to your project name
8+
* fixing imports and absolute path based on your project name
9+
* deleting and cleaning up some directories
810

9-
`python bootstrap.py --d [dirpath] --n [projectname]`
10-
11-
Where `[dirpath]` is the absolute path to the root of your directory where MLOps repo is cloned and `[projectname]` is the name of your ML project.
12-
13-
The script renames folders, files and files' content from the base project name `diabetes` to your project name. However, you might need to manually rename variables defined in a variable group and their values.
14-
15-
[This article](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) will also assist to use this code template for your own ML project.
16-
17-
### Using an existing dataset
11+
To bootstrap from the existing MLOpsPython repository:
1812

19-
The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data. To use your own data, you need to [create a Dataset](https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets) in your workspace and add a DATASET_NAME variable in the ***devopsforai-aml-vg*** variable group with the Dataset name. You'll also need to modify the test cases in the **ml_service/util/smoke_test_scoring_service.py** script to match the schema of the training features in your dataset.
20-
21-
## Customizing the CI and AML environments
22-
23-
In your project you will want to customize your own Docker image and Conda environment to use only the dependencies and tools required for your use case. This requires you to edit the following environment definition files:
24-
25-
- The Azure ML training and scoring Conda environment defined in [conda_dependencies.yml](diabetes_regression/conda_dependencies.yml).
26-
- The CI Docker image and Conda environment used by the Azure DevOps build agent. See [instructions for customizing the Azure DevOps job container](../docs/custom_container.md).
27-
28-
You will want to synchronize dependency versions as appropriate between both environment definitions (for example, ML libraries used both in training and in unit tests).
13+
1. Ensure Python 3 is installed locally
14+
1. Clone this repository locally
15+
1. Run bootstrap.py script
16+
`python bootstrap.py --d [dirpath] --n [projectname]`
17+
* `[dirpath]` is the absolute path to the root of the directory where MLOpsPython is cloned
18+
* `[projectname]` is the name of your ML project

docs/code_description.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,8 @@ The repository provides a template with folders structure suitable for maintaini
6767

6868
### Training Step
6969

70-
- `diabetes_regression/training/train.py` : a training step of an ML training pipeline.
70+
- `diabetes_regression/training/train_aml.py`: a training step of an ML training pipeline.
71+
- `diabetes_regression/training/train.py` : ML functionality called by train_aml.py
7172
- `diabetes_regression/training/R/r_train.r` : training a model with R basing on a sample dataset (weight_data.csv).
7273
- `diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
7374
- `diabetes_regression/training/R/train_with_r_on_databricks.py` : a python wrapper (ML Pipeline Step) invoking R training script on Databricks Compute

docs/custom_model.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Bring your own code with the MLOpsPython repository template
2+
3+
This document provides steps to follow when using this repository as a template to train models and deploy the models with real-time inference in Azure ML with your own scripts and data.
4+
5+
1. Follow the MLOpsPython [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide
6+
1. Follow the MLOpsPython [bootstrap instructions](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) to create your project starting point
7+
1. Configure training data
8+
1. [If necessary] Convert your ML experimental code into production ready code
9+
1. Replace the training code
10+
1. Update the evaluation code
11+
1. Customize the build agent environment
12+
1. [If appropriate] Replace the score code
13+
14+
## Follow the Getting Started guide
15+
16+
Follow the [Getting Started](https://github.com/microsoft/MLOpsPython/blob/master/docs/getting_started.md) guide to set up the infrastructure and pipelines to execute MLOpsPython.
17+
18+
## Follow the Bootstrap instructions
19+
20+
The [Bootstrap from MLOpsPython repository](https://github.com/microsoft/MLOpsPython/blob/master/bootstrap/README.md) guide will help you to quickly prepare the repository for your project.
21+
22+
**Note:** Since the bootstrap script will rename the `diabetes_regression` folder to the project name of your choice, we'll refer to your project as `[project name]` when paths are involved.
23+
24+
## Configure training data
25+
26+
The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data.
27+
28+
To use your own data:
29+
30+
1. [Create a Dataset](https://docs.microsoft.com/azure/machine-learning/how-to-create-register-datasets) in your Azure ML workspace
31+
1. Update the `DATASET_NAME` and `DATASTORE_NAME` variables in `.pipelines/[project name]-variables-template.yml`
32+
33+
## Convert your ML experimental code into production ready code
34+
35+
The MLOpsPython template creates an Azure Machine Learning (ML) pipeline that invokes a set of [Azure ML pipeline steps](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps) (see `ml_service/pipelines/[project name]_build_train_pipeline.py`). If your experiment is currently in a Jupyter notebook, it will need to be refactored into scripts that can be run independantly and dropped into the template which the existing Azure ML pipeline steps utilize.
36+
37+
1. Refactor your experiment code into scripts
38+
1. [Recommended] Prepare unit tests
39+
40+
Examples of all these scripts are provided in this repository.
41+
See the [Convert ML experimental code to production code tutorial](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production) for a step by step guide and additional details.
42+
43+
## Replace training code
44+
45+
The template contains three scripts in the `[project name]/training` folder. Update these scripts for your experiment code.
46+
47+
* `train.py` contains the platform-agnostic logic required to do basic data preparation and train the model. This script can be invoked against a static data file for local development.
48+
* `train_aml.py` is the entry script for the ML pipeline step. It invokes the functions in `train.py` in an Azure ML context and adds logging. `train_aml.py` loads parameters for training from `[project name]/parameters.json` and passes them to the training function in `train.py`. If your experiment code can be refactored to match the function signatures in `train.py`, this file shouldn't need many changes.
49+
* `test_train.py` contains tests that guard against functional regressions in `train.py`. Remove this file if you have no tests for your own code.
50+
51+
Add any dependencies required by training to `[project name]/conda_dependencies.yml]`. This file will be used to generate the environment that the pipeline steps will run in.
52+
53+
## Update evaluation code
54+
55+
The MLOpsPython template uses the evaluate_model script to compare the performance of the newly trained model and the current production model based on Mean Squared Error. If the performance of the newly trained model is better than the current production model, then the pipelines continue. Otherwise, the pipelines are canceled.
56+
57+
To keep the evaluation step, replace all instances of `mse` in `[project name]/evaluate/evaluate_model.py` with the metric that you want.
58+
59+
To disable the evaluation step, either:
60+
61+
* set the DevOps pipeline variable `RUN_EVALUATION` to `false`
62+
* uncomment `RUN_EVALUATION` in `.pipelines/[project name]-variables-template.yml` and set the value to `false`
63+
64+
## Customize the build agent environment
65+
66+
The DevOps pipeline definitions in the MLOpsPython template run several steps in a Docker container that contains the dependencies required to work through the Getting Started guide. If additional dependencies are required to run your unit tests or generate your Azure ML pipeline, there are a few options:
67+
68+
* Add a pipeline step to install dependencies required by unit tests to `.pipelines/code-quality-template.yml`. Recommended if you only have a small number of test dependencies.
69+
* Create a new Docker image containing your dependencies. See [docs/custom_container.md](custom_container.md). Recommended if you have a larger number of dependencies, or if the overhead of installing additional dependencies on each run is too high.
70+
* Remove the container references from the pipeline definition files and run the pipelines on self hosted agents with dependencies pre-installed.
71+
72+
## Replace score code
73+
74+
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps.
75+
76+
If you want to keep scoring:
77+
78+
1. Update or replace `[project name]/scoring/score.py`
79+
1. Add any dependencies required by scoring to `[project name]/conda_dependencies.yml`
80+
1. Modify the test cases in the `ml_service/util/smoke_test_scoring_service.py` script to match the schema of the training features in your data

docs/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ To remove the resources created for this project, use the [/environment_setup/ia
263263

264264
## Next Steps: Integrating your project
265265

266-
* Follow the [bootstrap instructions](../bootstrap/README.md) to create a starting point for your project use case. This guide includes information on bringing your own code to this repository template.
266+
* The [custom model](custom_model.md) guide includes information on bringing your own code to this repository template.
267267
* Consider using [Azure Pipelines self-hosted agents](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install) to speed up your Azure ML pipeline execution. The Docker container image for the Azure ML pipeline is sizable, and having it cached on the agent between runs can trim several minutes from your runs.
268268

269269
### Additional Variables and Configuration

0 commit comments

Comments
 (0)