Skip to content

Commit d5864f3

Browse files
dtzareedorenko
authored andcommitted
Upgrade build train CI pipeline to multi-stage (#90)
* add staged pipeline * remove release id * remove train in release pipeline * revert to BASE_NAME vars * Move train trigger to new stage * cleanup register comments * add conditional for triggering train pipe * update doc steps * string vs boolean * var to boolean * set to false * try with true * cleanup images * Use Coalesce so override works * add back build artifacts * address feedback * include code/scoring path for ci
1 parent fcc6fde commit d5864f3

11 files changed

+68
-151
lines changed

.pipelines/azdo-ci-build-train.yml

Lines changed: 62 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -3,62 +3,69 @@ trigger:
33
branches:
44
include:
55
- master
6-
7-
pool:
8-
vmImage: 'ubuntu-latest'
9-
10-
container: mcr.microsoft.com/mlops/python:latest
11-
6+
paths:
7+
exclude:
8+
- docs/
9+
- environment_setup/
10+
- charts/
11+
- ml_service/util/create_scoring_image.py
1212

1313
variables:
1414
- group: devopsforai-aml-vg
15+
# Choose from default, build_train_pipeline_with_r.py, or build_train_pipeline_with_r_on_dbricks.py
16+
- name: build-train-script
17+
value: 'build_train_pipeline.py'
18+
# Automatically triggers the train, evaluate, register pipeline after the CI steps.
19+
# Uncomment to set to false or add same variable name at queue time with value of false to disable.
20+
# - name: auto-trigger-training
21+
# value: false
1522

16-
17-
steps:
18-
19-
- template: azdo-base-pipeline.yml
20-
21-
- bash: |
22-
# Invoke the Python building and publishing a training pipeline with Python on ML Compute
23-
python3 $(Build.SourcesDirectory)/ml_service/pipelines/build_train_pipeline.py
24-
failOnStderr: 'false'
25-
env:
26-
SP_APP_SECRET: '$(SP_APP_SECRET)'
27-
displayName: 'Publish Azure Machine Learning Pipeline. Python on ML'
28-
enabled: 'true'
29-
30-
- bash: |
31-
# Invoke the Python building and publishing a training pipeline with R on ML Compute
32-
python3 $(Build.SourcesDirectory)/ml_service/pipelines/build_train_pipeline_with_r.py
33-
failOnStderr: 'false'
34-
env:
35-
SP_APP_SECRET: '$(SP_APP_SECRET)'
36-
displayName: 'Publish Azure Machine Learning Pipeline. R on ML Compute'
37-
enabled: 'false'
38-
39-
- bash: |
40-
# Invoke the Python building and publishing a training pipeline with R on DataBricks
41-
python3 $(Build.SourcesDirectory)/ml_service/pipelines/build_train_pipeline_with_r_on_dbricks.py
42-
failOnStderr: 'false'
43-
env:
44-
SP_APP_SECRET: '$(SP_APP_SECRET)'
45-
displayName: 'Publish Azure Machine Learning Pipeline. R on DataBricks'
46-
enabled: 'false'
47-
48-
- task: CopyFiles@2
49-
displayName: 'Copy Files to: $(Build.ArtifactStagingDirectory)'
50-
inputs:
51-
SourceFolder: '$(Build.SourcesDirectory)'
52-
TargetFolder: '$(Build.ArtifactStagingDirectory)'
53-
Contents: |
54-
ml_service/pipelines/?(run_train_pipeline.py|*.json)
55-
code/scoring/**
56-
57-
58-
- task: PublishBuildArtifacts@1
59-
displayName: 'Publish Artifact'
60-
inputs:
61-
ArtifactName: 'mlops-pipelines'
62-
publishLocation: 'container'
63-
pathtoPublish: '$(Build.ArtifactStagingDirectory)'
64-
TargetPath: '$(Build.ArtifactStagingDirectory)'
23+
stages:
24+
- stage: 'Model_CI'
25+
displayName: 'Model CI'
26+
jobs:
27+
- job: "Model_CI_Pipeline"
28+
displayName: "Model CI Pipeline"
29+
pool:
30+
vmImage: 'ubuntu-latest'
31+
container: mcr.microsoft.com/mlops/python:latest
32+
timeoutInMinutes: 0
33+
steps:
34+
- template: azdo-base-pipeline.yml
35+
- script: |
36+
# Invoke the Python building and publishing a training pipeline
37+
python3 $(Build.SourcesDirectory)/ml_service/pipelines/$(build-train-script)
38+
failOnStderr: 'false'
39+
env:
40+
SP_APP_SECRET: '$(SP_APP_SECRET)'
41+
displayName: 'Publish Azure Machine Learning Pipeline'
42+
- stage: 'Trigger_AML_Pipeline'
43+
displayName: 'Train, evaluate, register model via previously published AML pipeline'
44+
jobs:
45+
- job: "Invoke_Model_Pipeline"
46+
condition: and(succeeded(), eq(coalesce(variables['auto-trigger-training'], 'true'), 'true'))
47+
displayName: "Invoke Model Pipeline and evaluate results to register"
48+
pool:
49+
vmImage: 'ubuntu-latest'
50+
container: mcr.microsoft.com/mlops/python:latest
51+
timeoutInMinutes: 0
52+
steps:
53+
- script: |
54+
python $(Build.SourcesDirectory)/ml_service/pipelines/run_train_pipeline.py
55+
displayName: 'Trigger Training Pipeline'
56+
env:
57+
SP_APP_SECRET: '$(SP_APP_SECRET)'
58+
- task: CopyFiles@2
59+
displayName: 'Copy Files to: $(Build.ArtifactStagingDirectory)'
60+
inputs:
61+
SourceFolder: '$(Build.SourcesDirectory)'
62+
TargetFolder: '$(Build.ArtifactStagingDirectory)'
63+
Contents: |
64+
code/scoring/**
65+
- task: PublishBuildArtifacts@1
66+
displayName: 'Publish Artifact'
67+
inputs:
68+
ArtifactName: 'mlops-pipelines'
69+
publishLocation: 'container'
70+
pathtoPublish: '$(Build.ArtifactStagingDirectory)'
71+
TargetPath: '$(Build.ArtifactStagingDirectory)'

docs/getting_started.md

Lines changed: 5 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -145,9 +145,7 @@ you can set up the rest of the pipelines necessary for deploying your ML model
145145
to production. These are the pipelines that you will be setting up:
146146

147147
1. **Build pipeline:** triggered on code change to master branch on GitHub,
148-
performs linting, unit testing and publishing a training pipeline.
149-
1. **Release Trigger pipeline:** runs a published training pipeline to train,
150-
evaluate and register a model.
148+
performs linting, unit testing, publishing a training pipeline, and runs the published training pipeline to train, evaluate, and register a model.
151149
1. **Release Deployment pipeline:** deploys a model to QA (ACI) and Prod (AKS)
152150
environments.
153151

@@ -169,88 +167,25 @@ and checkout a published training pipeline in the **mlops-AML-WS** workspace in
169167

170168
![training pipeline](./images/training-pipeline.png)
171169

172-
Great, you now have the build pipeline set up which can either be manually
173-
triggered or automatically triggered every time there's a change in the master
174-
branch. The pipeline performs linting, unit testing, and builds and publishes an
170+
Great, you now have the build pipeline set up which automatically triggers every time there's a change in the master
171+
branch. The pipeline performs linting, unit testing, builds and publishes and executes a
175172
**ML Training Pipeline** in a **ML Workspace**.
176173

177174
**Note:** The build pipeline contains disabled steps to build and publish ML
178175
pipelines using R to train a model. Enable these steps if you want to play with
179-
this approach. For the pipeline training a model with R on Databricks you have
176+
this approach by changing the `build-train-script` pipeline variable to either `build_train_pipeline_with_r.py`, or `build_train_pipeline_with_r_on_dbricks.py`. For the pipeline training a model with R on Databricks you have
180177
to manually create a Databricks cluster and attach it to the ML Workspace as a
181178
compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables shoud be
182179
specified).
183180

184-
### Set up a Release Trigger Pipeline to Train the Model
185-
186-
The next step is to invoke the training pipeline created in the previous step.
187-
It can be done with a **Release Pipeline**. Click on the Pipelines/Releases
188-
menu, and then **New pipeline**, and then click on "Empty Job" on the
189-
"Select a template" window that pops to the right:
190-
191-
![invoke training pipeline](./images/invoke-training-pipeline.png)
192-
193-
Next, click on "Add an artifact". We will select the artifact of this pipeline
194-
to be the result of the build pipeline **ci-build**:
195-
196-
![artifact invoke pipeline](./images/artifact-invoke-pipeline.png)
197-
198-
After that, configure a pipeline to see values from the previously defined
199-
variable group **devopsforai-aml-vg**. Click on the "Variable groups",
200-
and to the right, click on "Link variable group". From there, pick the
201-
**devopsforai-aml-vg** variable group we created in an earlier step, choose
202-
"Release" as a variable group scope, and click on "Link":
203-
204-
![retrain pipeline vg](./images/retrain-pipeline-vg.png)
205-
206-
Rename the default "Stage 1" to **Invoke Training Pipeline** and make sure that
207-
the **Agent Specification** is **ubuntu-16.04** under the Agent Job:
208-
209-
![agent specification](./images/agent-specification.png)
210-
211-
Add a **Command Line Script** step, rename it to **Run Training Pipeline** with the following script:
212-
213-
```bash
214-
docker run -v $(System.DefaultWorkingDirectory)/_ci-build/mlops-pipelines/ml_service/pipelines:/pipelines \
215-
-w=/pipelines -e MODEL_NAME=$MODEL_NAME -e EXPERIMENT_NAME=$EXPERIMENT_NAME \
216-
-e TENANT_ID=$TENANT_ID -e SP_APP_ID=$SP_APP_ID -e SP_APP_SECRET=$(SP_APP_SECRET) \
217-
-e SUBSCRIPTION_ID=$SUBSCRIPTION_ID -e RELEASE_RELEASEID=$RELEASE_RELEASEID \
218-
-e BUILD_BUILDID=$BUILD_BUILDID -e BASE_NAME=$BASE_NAME \
219-
mcr.microsoft.com/mlops/python:latest python run_train_pipeline.py
220-
```
221-
222-
as in the screen shot below, leaving all other fields to their default value:
223-
224-
![Run Training Pipeline Task](./images/run_training_pipeline_task.png)
225-
226-
Now, add the automation to trigger a run of this pipeline whenever the
227-
**ci_build** build is completed, click on the lightning bolt icon on the top
228-
right of the **\_ci-build** artifact is selected, and enable the automatic
229-
release:
230-
231-
![automate_invoke_training_pipeline](./images/automate_invoke_training_pipeline.png)
232-
233-
This release pipeline should now be automatically triggered
234-
(continuous deployment) whenever a new **ML training pipeline** is published by
235-
the **ci-build builder pipeline**. It can also be triggered manually or
236-
configured to run on a scheduled basis. Create a new release to trigger the
237-
pipeline manually by clicking on the "Create release" button on the top right
238-
of your screen, when selecting this new build pipeline:
239-
240-
![create release](./images/create-release.png)
241-
242-
Leave the fields empty and click on "create". Once the release pipeline is
243-
completed, check out in the **ML Workspace** that the training pipeline is
244-
running:
245-
246181
![running training pipeline](./images/running-training-pipeline.png)
247182

248183
The training pipeline will train, evaluate, and register a new model. Wait until
249184
it is finished and make sure there is a new model in the **ML Workspace**:
250185

251186
![trained model](./images/trained-model.png)
252187

253-
Good! Now we have a trained model.
188+
To disable the automatic trigger of the training pipeline, change the `auto-trigger-training` variable as listed in the `.pipelines\azdo-ci-build-train.yml` pipeline to `false`. This can also be overridden at runtime execution of the pipeline.
254189

255190
### Set up a Release Deployment Pipeline to Deploy the Model
256191

@@ -268,9 +203,6 @@ The pipeline consumes two artifacts:
268203
1. the result of the **Build Pipeline** as it contains configuration files
269204
1. the **model** trained and registered by the ML training pipeline
270205

271-
Create a new release pipeline and add the **\_ci-build** artifact using the
272-
same process as what we did in the previous step.
273-
274206
Install the **Azure Machine Learning** extension to your organization from the
275207
[marketplace](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml),
276208
so that you can set up a service connection to your AML workspace.

docs/images/agent-specification.png

-58.5 KB
Binary file not shown.
-65.2 KB
Binary file not shown.
Binary file not shown.

docs/images/create-release.png

-63.3 KB
Binary file not shown.
-55.4 KB
Binary file not shown.

docs/images/retrain-pipeline-vg.png

-27.3 KB
Binary file not shown.
-186 KB
Binary file not shown.

ml_service/pipelines/build_train_pipeline.py

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ def main():
2222
sources_directory_train = os.environ.get("SOURCES_DIR_TRAIN")
2323
train_script_path = os.environ.get("TRAIN_SCRIPT_PATH")
2424
evaluate_script_path = os.environ.get("EVALUATE_SCRIPT_PATH")
25-
# register_script_path = os.environ.get("REGISTER_SCRIPT_PATH")
2625
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_CPU_SKU")
2726
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME")
2827
model_name = os.environ.get("MODEL_NAME")
@@ -90,27 +89,7 @@ def main():
9089
)
9190
print("Step Evaluate created")
9291

93-
# Currently, the Evaluate step will automatically register
94-
# the model if it performs better. This step is based on a
95-
# previous version of the repo which utilized JSON files to
96-
# track evaluation results.
97-
98-
# register_model_step = PythonScriptStep(
99-
# name="Register New Trained Model",
100-
# script_name=register_script_path,
101-
# compute_target=aml_compute,
102-
# source_directory=sources_directory_train,
103-
# arguments=[
104-
# "--release_id", release_id,
105-
# "--model_name", model_name,
106-
# ],
107-
# runconfig=run_config,
108-
# allow_reuse=False,
109-
# )
110-
# print("Step register model created")
111-
11292
evaluate_step.run_after(train_step)
113-
# register_model_step.run_after(evaluate_step)
11493
steps = [evaluate_step]
11594

11695
train_pipeline = Pipeline(workspace=aml_workspace, steps=steps)

0 commit comments

Comments
 (0)