You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-troubleshoot-environments.md
+51-15Lines changed: 51 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,9 +54,7 @@ Multiple environments with the same definition may result in the same cached ima
54
54
55
55
Running a training script remotely requires the creation of a Docker image.
56
56
57
-
### Reproducibility and vulnerabilities
58
-
59
-
#### *Vulnerabilities*
57
+
## Vulnerabilities in AzureML Environments
60
58
61
59
You can address vulnerabilities by upgrading to a newer version of a dependency (base image, Python package, etc.) or by migrating to a different dependency that satisfies security
62
60
requirements. Mitigating vulnerabilities is time consuming and costly since it can require refactoring of code and infrastructure. With the prevalence
@@ -68,38 +66,76 @@ There are some ways to decrease the impact of vulnerabilities:
68
66
- Compartmentalize your environment so you can scope and fix issues in one place.
69
67
- Understand flagged vulnerabilities and their relevance to your scenario.
70
68
71
-
#### *Vulnerabilities vs Reproducibility*
69
+
###Vulnerabilities vs Reproducibility
72
70
73
71
Reproducibility is one of the foundations of software development. When you're developing production code, a repeated operation must guarantee the same
74
72
result. Mitigating vulnerabilities can disrupt reproducibility by changing dependencies.
75
73
76
74
Azure Machine Learning's primary focus is to guarantee reproducibility. Environments fall under three categories: curated,
77
75
user-managed, and system-managed.
78
76
79
-
**Curated environments** are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned.
77
+
### *Curated Environments*
80
78
81
-
They contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is.
82
-
These pre-created environments also allow for faster deployment time.
79
+
Curated environments are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned. New versions are released by AzureML to address vulnerabilities. Whether you use the latest image may be a tradeoff between reproducibility and vulnerability management.
80
+
81
+
Curated Environments contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is. These pre-created environments also allow for faster deployment time.
82
+
83
+
### *User-managed Environments*
83
84
84
-
In **user-managed environments**, you're responsible for setting up your environment and installing every package that your training script needs on the
85
+
In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the
85
86
compute target and for model deployment. These types of environments have two subtypes:
86
87
87
88
- BYOC (bring your own container): the user provides a Docker image to Azure Machine Learning
88
89
- Docker build context: Azure Machine Learning materializes the image from the user provided content
89
90
90
-
Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability
91
-
management becomes your responsibility.
91
+
Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability management becomes your responsibility.
92
92
93
-
You use **system-managed environments** when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
93
+
### *System-managed Environments*
94
+
95
+
You use system-managed environments when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
94
96
latest image may be a tradeoff between reproducibility and vulnerability management. So, it's your responsibility to choose the environment version used
95
97
for your jobs or model deployments while using system-managed environments.
96
98
99
+
## Scan for Vulnerabilities
100
+
101
+
You can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md) to help scan images for vulnerabilities.
102
+
103
+
To automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
104
+
105
+
## Vulnerabilities: Common Issues
106
+
107
+
### Vulnerabilities in Base Docker Images
108
+
109
+
System-managed environments can have vulnerabilities from their base image. For example, vulnerabilities marked as "Ubuntu", "Debian" etc are usually from the system level of the environment, the base Docker image. If the base image is from a third-party issuer, please check if the latest version has fixes for the flagged vulnerabilities. Most common sources for the base images in AzureML are:
110
+
111
+
- Microsoft Artifact Registry (MAR) aka Microsoft Container Registry (mcr.microsoft.com). Images can be listed from MAR homepage, calling _catalog API, or [/tags/list](https://mcr.microsoft.com/v2/azureml/openmpi4.1.0-ubuntu20.04/tags/list)
112
+
- Nvidia (nvcr.io, or nvidia's Profile | Docker Hub )
113
+
114
+
If the latest version of your base image does not resolve your vulnerabilities, base image vulnerabilities can be addressed by installing versions recommended by a vulnerability scan:
115
+
116
+
```
117
+
apt-get install -y library_name
118
+
```
119
+
120
+
### Vulnerabilities in Python Packages
121
+
122
+
Vulnerabilities can also be from installed python packages on top of the system managed base image. These python related vulnerabilities should be resolved by updating your python dependencies. Python (Pip) vulnerabilities in the image usually come from user-defined dependencies.
123
+
124
+
To search for known python vulnerabilities and solutions please see GitHub Advisory Database. To address python vulnerabilities, update the package to the version that has fixes for the flagged issue:
125
+
126
+
```
127
+
pip install -u my_package=={good.version}
128
+
```
129
+
130
+
or if you're using a conda environment, update the reference in the conda dependencies file.
131
+
132
+
In some cases python packages will be automatically installed during conda's setup of your environment on top of a base Docker image. Mitigation steps for those are the same as user-introduced packages. Conda installs necessary dependencies for every environment it materializes. Packages like cryptography, setuptools, wheel, etc. will be automatically installed from conda's default channels. There is a known issue with the default anaconda channel missing latest package versions, it is recommended to prioritize community-maintained conda-forge. Otherwise, please explicitely specify packages and versions, even if you do not reference them in the code you plan to execute on that environment.
133
+
134
+
### Cache issues
135
+
97
136
Associated to your Azure Machine Learning workspace is an Azure Container Registry instance that's a cache for container images. Any image
98
137
materialized is pushed to the container registry and used if you trigger experimentation or deployment for the corresponding environment. Azure
99
-
Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time. You
100
-
can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md)
101
-
to help scan images for vulnerabilities. To
102
-
automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
138
+
Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time.
0 commit comments