Skip to content

Commit 962778c

Browse files
authored
Manage environments in conda YAML files (#158)
* . * . * Update code_test.py * . * Update Dockerfile * Do not use conda-merge * Move all 3 conda files to a single dir * Do not use conda-merge * Pin package versions * PR review fixes * Update Dockerfile * PR review fixes * Update training_dependencies.yml * Update code_test.py
1 parent 0b4f233 commit 962778c

12 files changed

+98
-56
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ ENV/
9393
env.bak/
9494
venv.bak/
9595
*.vscode
96+
condaenv.*
9697

9798
# Spyder project settings
9899
.spyderproject
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: mlopspython_ci
2+
3+
dependencies:
4+
5+
# The python interpreter version.
6+
- python=3.7.5
7+
8+
- r=3.6.0
9+
- r-essentials=3.6.0
10+
- numpy=1.18.1
11+
- pandas=1.0.0
12+
- scikit-learn=0.22.1
13+
14+
- pip=20.0.2
15+
- pip:
16+
17+
# dependencies shared with other environment .yml files.
18+
- azureml-sdk==1.0.79
19+
20+
# Additional pip dependencies for the CI environment.
21+
- pytest==5.3.1
22+
- pytest-cov==2.8.1
23+
- requests==2.22.0
24+
- python-dotenv==0.10.3
25+
- flake8==3.7.9
26+
- flake8_formatter_junit_xml==0.0.6
27+
- azure-cli==2.0.77
28+
- tox==3.14.3
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
entryScript: score.py
22
runtime: python
3-
condaFile: conda_dependencies.yml
3+
condaFile: ../scoring_dependencies.yml
44
extraDockerfileSteps:
55
schemaFile:
66
sourceDirectory:
77
enableGpu: False
88
baseImage:
9-
baseImageRegistry:
9+
baseImageRegistry:

diabetes_regression/scoring/conda_dependencies.yml renamed to diabetes_regression/scoring_dependencies.yml

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,24 +14,23 @@
1414
# This directive is stored in a comment to preserve the Conda file structure.
1515
# [AzureMlVersion] = 2
1616

17-
name: project_environment
17+
name: diabetes_scoring
18+
1819
dependencies:
20+
1921
# The python interpreter version.
20-
# Currently Azure ML Workbench only supports 3.5.2 and later.
2122
- python=3.7.5
23+
2224
# Required by azureml-defaults, installed separately through Conda to
2325
# get a prebuilt version and not require build tools for the install.
2426
- psutil=5.6 #latest
2527

28+
- numpy=1.18.1
29+
- pandas=1.0.0
30+
- scikit-learn=0.22.1
31+
32+
- pip=20.0.2
2633
- pip:
27-
# Required packages for AzureML execution, history, and data preparation.
28-
- azureml-model-management-sdk==1.0.1b6.post1
29-
- azureml-sdk==1.0.74
30-
- scipy==1.3.1
31-
- scikit-learn==0.22
32-
- pandas==0.25.3
33-
- numpy==1.17.3
34-
- joblib==0.14.0
35-
- gunicorn==19.9.0
36-
- flask==1.1.1
37-
- inference-schema[numpy-support]
34+
# You must list azureml-defaults as a pip dependency
35+
- azureml-defaults==1.0.85
36+
- inference-schema[numpy-support]==1.0.1
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: diabetes_training
2+
3+
dependencies:
4+
5+
# The python interpreter version.
6+
- python=3.7.5
7+
8+
- numpy=1.18.1
9+
- pandas=1.0.0
10+
- scikit-learn=0.22.1
11+
#- r-essentials
12+
#- tensorflow
13+
#- keras
14+
15+
- pip=20.0.2
16+
- pip:
17+
- azureml-core==1.0.79

docs/code_description.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,7 @@
22

33
### Environment Setup
44

5-
- `environment_setup/requirements.txt` : It consists of a list of python packages which are needed by the train.py to run successfully on host agent (locally).
6-
7-
- `environment_setup/install_requirements.sh` : This script prepares the python environment i.e. install the Azure ML SDK and the packages specified in requirements.txt
5+
- `environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.
86

97
- `environment_setup/iac-*.yml, arm-templates` : Infrastructure as Code piplines to create and delete required resources along with corresponding arm-templates.
108

@@ -27,6 +25,12 @@
2725
- `ml_service/pipelines/diabetes_regression_verify_train_pipeline.py` : determines whether the evaluate_model.py step of the training pipeline registered a new model.
2826
- `ml_service/util` : contains common utility functions used to build and publish an ML training pipeline.
2927

28+
### Environment Definitions
29+
30+
- `diabetes_regression/training_dependencies.yml` : Conda environment definition for the training environment (Docker image in which train.py is run).
31+
- `diabetes_regression/scoring_dependencies.yml` : Conda environment definition for the scoring environment (Docker image in which score.py is run).
32+
- `diabetes_regression/ci_dependencies.yml` : Conda environment definition for the CI environment.
33+
3034
### Code
3135

3236
- `diabetes_regression/training/train.py` : a training step of an ML training pipeline.
@@ -39,5 +43,4 @@
3943

4044
### Scoring
4145
- `diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
42-
- `diabetes_regression/scoring/conda_dependencies.yml` : contains a list of dependencies required by score.py to be installed in a deployable Docker Image
4346
- `diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.

docs/getting_started.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,13 @@ Great, you now have the build pipeline set up which automatically triggers every
171171

172172
**Note:** The build pipeline also supports building and publishing ML
173173
pipelines using R to train a model. This is enabled
174-
by changing the `build-train-script` pipeline variable to either `diabetes_regression_build_train_pipeline_with_r.py`, or `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py`. For pipeline training a model with R on Databricks you'll need
174+
by changing the `build-train-script` pipeline variable to either of:
175+
* `diabetes_regression_build_train_pipeline_with_r.py` to train a model
176+
with R on Azure ML Compute. You will also need to add the
177+
`r-essentials` Conda packages into `diabetes_regression/scoring_dependencies.yml`
178+
and `diabetes_regression/training_dependencies.yml`.
179+
* `diabetes_regression_build_train_pipeline_with_r_on_dbricks.py`
180+
to train a model with R on Databricks. You will need
175181
to manually create a Databricks cluster and attach it to the ML Workspace as a
176182
compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables should be
177183
specified).
@@ -243,6 +249,7 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta
243249
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
244250
* The sample pipeline generates a random value for a model hyperparameter (ridge regression [*alpha*](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)) to generate 'interesting' charts when testing the sample. In a real application you should use fixed hyperparameter values. You can [tune hyperparameter values using Azure ML](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters), and manage their values in Azure DevOps Variable Groups.
245251
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
252+
* You can install additional Conda or pip packages by modifying the YAML environment configurations under the `diabetes_regression` directory. Make sure to use fixed version numbers for all packages to ensure reproducibility, and use the same versions across environments.
246253
* You can explore aspects of model observability in the solution, such as:
247254
* **Logging**: navigate to the Application Insights instance linked to the Azure ML Portal,
248255
then to the Logs (Analytics) pane. The following sample query correlates HTTP requests with custom logs

environment_setup/Dockerfile

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,16 @@ LABEL org.label-schema.vendor = "Microsoft" \
44
org.label-schema.url = "https://hub.docker.com/r/microsoft/mlopspython" \
55
org.label-schema.vcs-url = "https://github.com/microsoft/MLOpsPython"
66

7-
7+
COPY diabetes_regression/ci_dependencies.yml /setup/
88

9-
COPY environment_setup/requirements.txt /setup/
10-
11-
RUN apt-get update && apt-get install gcc -y && pip install --upgrade -r /setup/requirements.txt && \
12-
conda install -c r r-essentials
9+
RUN conda env create -f /setup/ci_dependencies.yml
1310

14-
CMD ["python"]
11+
# activate environment
12+
ENV PATH /usr/local/envs/mlopspython_ci/bin:$PATH
13+
RUN /bin/bash -c "source activate mlopspython_ci"
14+
15+
# Verify conda installation.
16+
# This serves as workaround for https://github.com/conda/conda/issues/8537 (conda env create doesn't fail
17+
# if pip installation fails, for example due to a wrong package version).
18+
# The `az` command is not available if pip has not run (and installed azure-cli).
19+
RUN az --version

environment_setup/install_requirements.sh

100644100755
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
# ARISING IN ANY WAY OUT OF THE USE OF THE SOFTWARE CODE, EVEN IF ADVISED OF THE
2525
# POSSIBILITY OF SUCH DAMAGE.
2626

27+
set -eux
2728

28-
python --version
29-
pip install -r requirements.txt
29+
conda env create -f diabetes_regression/ci_dependencies.yml
30+
31+
conda activate mlopspython_ci

environment_setup/requirements.txt

Lines changed: 0 additions & 12 deletions
This file was deleted.

0 commit comments

Comments
 (0)