Skip to content

Commit b9b8109

Browse files
committed
Merge branch 'feature/docs' into develop
2 parents f580ae9 + bf47a13 commit b9b8109

File tree

11 files changed

+1007
-249
lines changed

11 files changed

+1007
-249
lines changed

README.md

Lines changed: 48 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,96 +1,72 @@
1-
### Author: Praneet Singh Solanki
1+
### Author: | Praneet Singh Solanki | Richin Jain |
22

3-
# DevOps For AI
3+
# DevOps for AI
44

55
[![Build Status](https://dev.azure.com/customai/DevopsForAI-AML/_apis/build/status/Microsoft.DevOpsForAI?branchName=master)](https://dev.azure.com/customai/DevopsForAI-AML/_build/latest?definitionId=1&branchName=master)
66

7-
[DevOps for AI template](https://azuredevopsdemogenerator.azurewebsites.net/?name=azure%20machine%20learning) will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release pipelines along with Azure ML services for ML/AI model management and operationalization.
87

9-
This template contains code and pipeline definition for a machine learning project demonstrating how to automate the end to end ML/AI project. The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod, integration testing and functional testing.
8+
9+
DevOps for AI will help you to understand how to build the Continuous Integration and Continuous Delivery pipeline for a ML/AI project. We will be using the Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization.
10+
11+
This template contains code and pipeline definition for a machine learning project demonstrating how to automate the end to end ML/AI project. The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod and integration testing.
12+
1013

1114
## Prerequisite
1215
- Active Azure subscription
13-
- Minimum contributor access to Azure subscription
16+
- At least contributor access to Azure subscription
1417

1518
## Getting Started:
1619

17-
### Import the DevOps for AI solution template from Azure DevOps Demo Generator: [Click here](https://azuredevopsdemogenerator.azurewebsites.net/?name=azure%20machine%20learning)
18-
19-
Skip above step if already done.
20-
21-
Once the template is imported for personal Azure DevOps account using DevOps demo generator, you need to follow below steps to get the pipeline running:
22-
23-
### Update Pipeline Config:
24-
25-
#### Build Pipeline
26-
1. Go to the **Pipelines -> Builds** on the newly created project and click **Edit** on top right
27-
![EditPipeline1](/docs/images/EditPipeline1.png)
28-
2. Click on **Create or Get Workspace** task, select the Azure subscription where you want to deploy and run the solution, and click **Authorize**
29-
![EditPipeline2](/docs/images/EditPipeline2.png)
30-
3. Click all other tasks below it and select the same subscription (no need to authorize again)
31-
4. Once the tasks are updated with subscription, click on **Save & queue** and select **Save**
32-
![EditPipeline3](/docs/images/EditPipeline3.png)
33-
34-
#### Release Pipeline
35-
1. Go to the **Pipelines -> Releases** and click **Edit** on top
36-
![EditPipeline4](/docs/images/EditPipeline4.png)
37-
2. Click on **1 job, 4 tasks** to open the tasks in **QA stage**
38-
![EditPipeline5](/docs/images/EditPipeline5.png)
39-
3. Update the subscription details in two tasks
40-
![EditPipeline6](/docs/images/EditPipeline6.png)
41-
4. Click on **Tasks** on the top to switch to the Prod stage, update the subscription details for the two tasks in prod
42-
![EditPipeline7](/docs/images/EditPipeline7.png)
43-
5. Once you fix all the missing subscription, the **Save** is no longer grayed, click on save to save the changes in release pepeline
44-
![EditPipeline8](/docs/images/EditPipeline8.png)
45-
46-
### Update Repo config:
47-
1. Go to the **Repos** on the newly created Azure DevOps project
48-
2. Open the config file [/aml_config/config.json](/aml_config/config.json) and edit it
49-
3. Put your Azure subscription ID in place of <>
50-
4. Change resource group and AML workspace name if you want
51-
5. Put the location where you want to deploy your Azure ML service workspace
52-
6. Save the changes and commit these changes to master branch
53-
7. The commit will trigger the build pipeline to run deploying AML end to end solution
54-
8. Go to **Pipelines -> Builds** to see the pipeline run
55-
56-
## Steps Performed in the Build Pipeline:
57-
58-
1. Prepare the python environment
59-
2. Get or Create the workspace
60-
3. Submit Training job on the remote DSVM / Local Python Env
61-
4. Register model to workspace
62-
5. Create Docker Image for Scoring Webservice
63-
6. Copy and Publish the Artifacts to Release Pipeline
64-
65-
## Steps Performed in the Release Pipeline
66-
In Release pipeline we deploy the image created from the build pipeline to Azure Container Instance and Azure Kubernetes Services
67-
68-
### Deploy on ACI - QA Stage
69-
1. Prepare the python environment
70-
2. Create ACI and Deploy webservice image created in Build Pipeline
71-
3. Test the scoring image
72-
73-
### Deploy on AKS - PreProd/Prod Stage
74-
1. Prepare the python environment
75-
2. Deploy on AKS
76-
- Create AKS and create a new webservice on AKS with the scoring docker image
77-
78-
OR
79-
80-
- Get the existing AKS and update the webservice with new image created in Build Pipeline
81-
3. Test the scoring image
20+
To deploy this solution in your subscription, follow the manual instructions in the [getting started](docs/getting_started.md) doc
21+
22+
23+
## Architecture Diagram
24+
25+
This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning. The solution is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis.
26+
27+
![Architecture](/docs/images/Architecture_DevOps_AI.png)
28+
29+
30+
## Architecture Flow
31+
32+
1. Data Scientist writes/updates the code and push it to git repo. This triggers the Azure DevOps build pipeline (continuous integration).
33+
2. Once the Azure DevOps build pipeline is triggered, it runs following type of tasks:
34+
- Run for new code: Every time new code is committed to the repo, build pipeline performs data sanity test and unit tests the new code.
35+
36+
- One-time run: These tasks runs only for the first time build pipeline run, they create [Azure ML Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace), [Azure ML Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) used as model training compute and publish a [Azure ML Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) with code. This published Azure ML pipeline is the model training/retraining pipeline.
37+
38+
`Note: The task Publish Azure ML pipeline currently runs for every code change`
39+
40+
3. The Azure ML Retraining pipeline is triggered once the Azure DevOps build pipeline completes. All the tasks in this pipeline runs on Azure ML Compute created earlier. Following are the tasks in this pipeline:
41+
42+
- **Train Model** task executes model training script on Azure ML Compute. It outputs a [model](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model) file which is stored in the [run history](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#run)
43+
44+
- **Evaluate Model** task evaluates the performance of newly trained model with the model in production. If new trained model performs better than the production model, next steps are executed. Else next steps are skipped.
45+
46+
- **Register Model** task takes the new trained better performing model and registers it with the [Azure ML Model registry](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#model-registry) to version control it.
47+
48+
- **Package Model** task packages the new trained model along with scoring file and python dependencies into a docker [image](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#image) and pushes it to [Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro). This image is used to deploy model as [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service).
49+
50+
4. Once a new model scoring image is pushed to Azure Container Registry, Azure DevOps Release/Deployment pipeline is triggered. This pipeline deploys the model scoring image into Staging/QA and PROD environments.
51+
52+
- In the Staging/QA, one task creates [Azure Container Instance](https://docs.microsoft.com/en-us/azure/container-instances/container-instances-overview) and deploy scoring image as [web service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#web-service) on it.
53+
54+
- The second task test this web service by calling its REST endpoint with dummy data.
55+
56+
57+
5. The deployment in production is a [gated release](https://docs.microsoft.com/en-us/azure/devops/pipelines/release/approvals/gates?view=azure-devops). Which means, once the model web service deployment in Staging/QA environment is successful, a notification is sent to approvers to manually review and approve the release. Once the release is approved, the model scoring web service is deployed to [Azure Kubernetes Service(AKS)](https://docs.microsoft.com/en-us/azure/aks/intro-kubernetes) and the deployment is tested.
8258

8359
### Repo Details
8460

85-
You can find the details of the code ans scripts in the repository [here](/docs/code_description.md)
61+
You can find the details of the code and scripts in the repository [here](/docs/code_description.md)
8662

8763
### References
88-
8964
- [Azure Machine Learning(Azure ML) Service Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml)
9065

9166
- [Azure ML Samples](https://docs.microsoft.com/en-us/azure/machine-learning/service/samples-notebooks)
9267
- [Azure ML Python SDK Quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
9368
- [Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/?view=vsts)
69+
- [DevOps for AI template (Old Version)](https://azuredevopsdemogenerator.azurewebsites.net/?name=azure%20machine%20learning)
9470

9571
# Contributing
9672

azure-pipeline-yaml/simple-azure-pipelines.yml

Lines changed: 0 additions & 64 deletions
This file was deleted.

azure-pipelines.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ pool:
88
variables:
99
- group: AzureKeyVaultSecrets
1010

11+
trigger:
12+
- master
13+
- releases/*
14+
- develop
15+
1116
steps:
1217
- task: UsePythonVersion@0
1318
inputs:

code/scoring/score.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,10 @@
2323
ARISING IN ANY WAY OUT OF THE USE OF THE SOFTWARE CODE, EVEN IF ADVISED OF THE
2424
POSSIBILITY OF SUCH DAMAGE.
2525
"""
26+
import pickle
2627
import json
2728
import numpy
29+
from sklearn.ensemble import RandomForestClassifier
2830
from azureml.core.model import Model
2931

3032

code/training/train.py

Lines changed: 57 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
ARISING IN ANY WAY OUT OF THE USE OF THE SOFTWARE CODE, EVEN IF ADVISED OF THE
2424
POSSIBILITY OF SUCH DAMAGE.
2525
"""
26+
import pickle
27+
from azureml.core import Workspace
2628
from azureml.core.run import Run
2729
import os
2830
import argparse
@@ -32,41 +34,69 @@
3234
from sklearn.model_selection import train_test_split
3335
from sklearn.externals import joblib
3436
import numpy as np
37+
import json
38+
import subprocess
39+
from typing import Tuple, List
40+
41+
42+
parser = argparse.ArgumentParser("train")
43+
parser.add_argument(
44+
"--config_suffix", type=str, help="Datetime suffix for json config files"
45+
)
46+
parser.add_argument(
47+
"--json_config",
48+
type=str,
49+
help="Directory to write all the intermediate json configs",
50+
)
51+
args = parser.parse_args()
52+
53+
print("Argument 1: %s" % args.config_suffix)
54+
print("Argument 2: %s" % args.json_config)
55+
56+
if not (args.json_config is None):
57+
os.makedirs(args.json_config, exist_ok=True)
58+
print("%s created" % args.json_config)
59+
60+
run = Run.get_context()
61+
exp = run.experiment
62+
ws = run.experiment.workspace
3563

36-
# using diabetes dataset from scikit-learn
3764
X, y = load_diabetes(return_X_y=True)
65+
columns = ["age", "gender", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"]
3866
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
3967
data = {"train": {"X": X_train, "y": y_train}, "test": {"X": X_test, "y": y_test}}
4068

69+
print("Running train.py")
70+
71+
# Randomly pic alpha
72+
alphas = np.arange(0.0, 1.0, 0.05)
73+
alpha = alphas[np.random.choice(alphas.shape[0], 1, replace=False)][0]
74+
print(alpha)
75+
run.log("alpha", alpha)
76+
reg = Ridge(alpha=alpha)
77+
reg.fit(data["train"]["X"], data["train"]["y"])
78+
preds = reg.predict(data["test"]["X"])
79+
run.log("mse", mean_squared_error(preds, data["test"]["y"]))
80+
4181

42-
def experiment_code(data_split):
43-
run = Run.get_submitted_run()
44-
# Randomly pic alpha
45-
alphas = np.arange(0.0, 1.0, 0.05)
46-
alpha = alphas[np.random.choice(alphas.shape[0], 1, replace=False)][0]
47-
print(alpha)
48-
# Log alpha metric
49-
run.log("alpha", alpha)
50-
# train the model with selected value of alpha and log mse
51-
reg = Ridge(alpha=alpha)
52-
reg.fit(data["train"]["X"], data_split["train"]["y"])
53-
preds = reg.predict(data["test"]["X"])
54-
run.log("mse", mean_squared_error(preds, data_split["test"]["y"]))
82+
# Save model as part of the run history
83+
model_name = "sklearn_regression_model.pkl"
84+
# model_name = "."
5585

56-
# Write model name to the config file
57-
model_name = "sklearn_regression_model.pkl"
58-
with open(model_name, "wb"):
59-
joblib.dump(value=reg, filename=model_name)
86+
with open(model_name, "wb") as file:
87+
joblib.dump(value=reg, filename=model_name)
6088

61-
# upload the model file explicitly into artifacts
62-
run.upload_file(name="./outputs/" + model_name, path_or_stream=model_name)
63-
print("Uploaded the model {} to experiment {}".format(model_name, run.experiment.name))
64-
dirpath = os.getcwd()
65-
print(dirpath)
89+
# upload the model file explicitly into artifacts
90+
run.upload_file(name="./outputs/" + model_name, path_or_stream=model_name)
91+
print("Uploaded the model {} to experiment {}".format(model_name, run.experiment.name))
92+
dirpath = os.getcwd()
93+
print(dirpath)
94+
print("Following files are uploaded ")
95+
print(run.get_file_names())
6696

67-
print("Following files are uploaded ")
68-
print(run.get_file_names())
69-
run.complete()
97+
# register the model
98+
# run.log_model(file_name = model_name)
99+
# print('Registered the model {} to run history {}'.format(model_name, run.history.name))
70100

71101
run_id = {}
72102
run_id["run_id"] = run.id
@@ -76,6 +106,4 @@ def experiment_code(data_split):
76106
with open(output_path, "w") as outfile:
77107
json.dump(run_id, outfile)
78108

79-
if __name__ == "__main__":
80-
print("Running train.py")
81-
experiment_code(data)
109+
run.complete()

docs/getting_started.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
## Getting Started with this Repo
2+
3+
### 1. Get the source code
4+
- Either clone the repository to your workspace and create your own repo with the code in GitHub.
5+
- An easier way is to just fork the project, so you have the repoitory under your username on GitHub itslef.
6+
7+
8+
### 2. Create Azure DevOps account
9+
We use Azure DevOps for running our build(CI), retraining trigger and release (CD) pipelines. If you don't already have Azure DevOps account, create one by following the instructions [here](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/create-organization?view=azure-devops)
10+
11+
If you already have Azure DevOps account, create a new project.
12+
13+
**Note:** Make sure you have the right permissions in Azure DevOps to do so.
14+
15+
### 3. Create Service Principal to Login to Azure and create resources
16+
17+
To create service principal, register an application entity in Azure Active Directory (Azure AD) and grant it the Contributor or Owner role of the subscription or the resource group where the web service belongs to. See [how to create service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) and assign permissions to manage Azure resource
18+
19+
**Note:** You must have sufficient permissions to register an application with your Azure AD tenant, and assign the application to a role in your Azure subscription. Contact your subscription adminstator if you don't have the permissions. Normally a subscription admin will create a Service principal and will provide you the details.
20+
21+
22+
23+
412 KB
Loading

0 commit comments

Comments
 (0)