You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Troubleshooting environment image builds using troubleshooting log error messages
16
+
# Troubleshooting environment issues
17
17
18
-
In this article, learn how to troubleshoot common problems you may encounter with environment image builds.
18
+
In this article, learn how to troubleshoot common problems you may encounter with environment image builds and learn about AzureML environment vulnerabilities.
19
19
20
20
We are actively seeking your feedback! If you navigated to this page via your Environment Definition or Build Failure Analysis logs, we'd like to know if the feature was helpful to you, or if you'd like to report a failure scenario that isn't yet covered by our analysis. You can also leave feedback on this documentation. Leave your thoughts [here](https://aka.ms/azureml/environment/log-analysis-feedback).
21
21
@@ -54,9 +54,7 @@ Multiple environments with the same definition may result in the same cached ima
54
54
55
55
Running a training script remotely requires the creation of a Docker image.
56
56
57
-
### Reproducibility and vulnerabilities
58
-
59
-
#### *Vulnerabilities*
57
+
## Vulnerabilities in AzureML Environments
60
58
61
59
You can address vulnerabilities by upgrading to a newer version of a dependency (base image, Python package, etc.) or by migrating to a different dependency that satisfies security
62
60
requirements. Mitigating vulnerabilities is time consuming and costly since it can require refactoring of code and infrastructure. With the prevalence
@@ -68,38 +66,82 @@ There are some ways to decrease the impact of vulnerabilities:
68
66
- Compartmentalize your environment so you can scope and fix issues in one place.
69
67
- Understand flagged vulnerabilities and their relevance to your scenario.
70
68
71
-
#### *Vulnerabilities vs Reproducibility*
69
+
### Scan for Vulnerabilities
70
+
71
+
You can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md) to help scan images for vulnerabilities.
72
+
73
+
To automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
74
+
75
+
### Vulnerabilities vs Reproducibility
72
76
73
77
Reproducibility is one of the foundations of software development. When you're developing production code, a repeated operation must guarantee the same
74
78
result. Mitigating vulnerabilities can disrupt reproducibility by changing dependencies.
75
79
76
80
Azure Machine Learning's primary focus is to guarantee reproducibility. Environments fall under three categories: curated,
77
81
user-managed, and system-managed.
78
82
79
-
**Curated environments** are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned.
83
+
### *Curated Environments*
80
84
81
-
They contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is.
82
-
These pre-created environments also allow for faster deployment time.
85
+
Curated environments are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned. New versions are released by Azure Machine Learning to address vulnerabilities. Whether you use the latest image may be a tradeoff between reproducibility and vulnerability management.
86
+
87
+
Curated Environments contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is. These pre-created environments also allow for faster deployment time.
88
+
89
+
### *User-managed Environments*
83
90
84
-
In **user-managed environments**, you're responsible for setting up your environment and installing every package that your training script needs on the
91
+
In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the
85
92
compute target and for model deployment. These types of environments have two subtypes:
86
93
87
94
- BYOC (bring your own container): the user provides a Docker image to Azure Machine Learning
88
95
- Docker build context: Azure Machine Learning materializes the image from the user provided content
89
96
90
-
Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability
91
-
management becomes your responsibility.
97
+
Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability management becomes your responsibility.
92
98
93
-
You use **system-managed environments** when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
99
+
### *System-managed Environments*
100
+
101
+
You use system-managed environments when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
94
102
latest image may be a tradeoff between reproducibility and vulnerability management. So, it's your responsibility to choose the environment version used
95
103
for your jobs or model deployments while using system-managed environments.
96
104
105
+
### Vulnerabilities: Common Issues
106
+
107
+
### *Vulnerabilities in Base Docker Images*
108
+
109
+
System vulnerabilities in an environment are usually introduced from the base image. For example, vulnerabilities marked as "Ubuntu" or "Debian" are from the system level of the environment–the base Docker image. If the base image is from a third-party issuer, please check if the latest version has fixes for the flagged vulnerabilities. Most common sources for the base images in Azure Machine Learning are:
110
+
111
+
- Microsoft Artifact Registry (MAR) aka Microsoft Container Registry (mcr.microsoft.com).
112
+
- Images can be listed from MAR homepage, calling _catalog API, or [/tags/list](https://mcr.microsoft.com/v2/azureml/openmpi4.1.0-ubuntu20.04/tags/list)_
113
+
- Source and release notes for training base images from AzureML can be found in [Azure/AzureML-Containers](https://github.com/Azure/AzureML-Containers)
114
+
- Nvidia (nvcr.io, or [nvidia's Profile](https://hub.docker.com/u/nvidia/#!))
115
+
116
+
If the latest version of your base image does not resolve your vulnerabilities, base image vulnerabilities can be addressed by installing versions recommended by a vulnerability scan:
117
+
118
+
```
119
+
apt-get install -y library_name
120
+
```
121
+
122
+
### *Vulnerabilities in Python Packages*
123
+
124
+
Vulnerabilities can also be from installed Python packages on top of the system-managed base image. These Python-related vulnerabilities should be resolved by updating your Python dependencies. Python (pip) vulnerabilities in the image usually come from user-defined dependencies.
125
+
126
+
To search for known Python vulnerabilities and solutions please see [GitHub Advisory Database](https://github.com/advisories). To address Python vulnerabilities, update the package to the version that has fixes for the flagged issue:
127
+
128
+
```
129
+
pip install -u my_package=={good.version}
130
+
```
131
+
132
+
If you're using a conda environment, update the reference in the conda dependencies file.
133
+
134
+
In some cases, Python packages will be automatically installed during conda's setup of your environment on top of a base Docker image. Mitigation steps for those are the same as those for user-introduced packages. Conda installs necessary dependencies for every environment it materializes. Packages like cryptography, setuptools, wheel, etc. will be automatically installed from conda's default channels. There's a known issue with the default anaconda channel missing latest package versions, so it's recommended to prioritize the community-maintained conda-forge channel. Otherwise, please explicitly specify packages and versions, even if you don't reference them in the code you plan to execute on that environment.
135
+
136
+
### *Cache issues*
137
+
97
138
Associated to your Azure Machine Learning workspace is an Azure Container Registry instance that's a cache for container images. Any image
98
139
materialized is pushed to the container registry and used if you trigger experimentation or deployment for the corresponding environment. Azure
99
-
Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time. You
100
-
can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md)
101
-
to help scan images for vulnerabilities. To
102
-
automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
140
+
Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time.
141
+
142
+
## Troubleshooting environment image builds
143
+
144
+
Learn how to troubleshoot issues with environment image builds and package installations.
0 commit comments