Merge pull request #107802 from gfitzgerald42/patch-2

Jak-MS · web-flow · commit fe41c38f46db · 2023-04-11T15:20:23.000-07:00
Provide additional information on vulnerabilities
diff --git a/articles/machine-learning/how-to-troubleshoot-environments.md b/articles/machine-learning/how-to-troubleshoot-environments.md
@@ -13,9 +13,9 @@ ms.topic: troubleshooting
 ms.custom: devx-track-python, event-tier1-build-2022, ignite-2022
 ---
 
-# Troubleshooting environment image builds using troubleshooting log error messages
+# Troubleshooting environment issues
 
-In this article, learn how to troubleshoot common problems you may encounter with environment image builds.
+In this article, learn how to troubleshoot common problems you may encounter with environment image builds and learn about AzureML environment vulnerabilities.
 
 We are actively seeking your feedback! If you navigated to this page via your Environment Definition or Build Failure Analysis logs, we'd like to know if the feature was helpful to you, or if you'd like to report a failure scenario that isn't yet covered by our analysis. You can also leave feedback on this documentation. Leave your thoughts [here](https://aka.ms/azureml/environment/log-analysis-feedback). 
 
@@ -54,9 +54,7 @@ Multiple environments with the same definition may result in the same cached ima
 
 Running a training script remotely requires the creation of a Docker image.
 
-### Reproducibility and vulnerabilities
-
-#### *Vulnerabilities*
+## Vulnerabilities in AzureML Environments
 
 You can address vulnerabilities by upgrading to a newer version of a dependency (base image, Python package, etc.) or by migrating to a different dependency that satisfies security
 requirements. Mitigating vulnerabilities is time consuming and costly since it can require refactoring of code and infrastructure. With the prevalence
@@ -68,38 +66,82 @@ There are some ways to decrease the impact of vulnerabilities:
 - Compartmentalize your environment so you can scope and fix issues in one place.
 - Understand flagged vulnerabilities and their relevance to your scenario.
 
-#### *Vulnerabilities vs Reproducibility*
+### Scan for Vulnerabilities 
+
+You can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md) to help scan images for vulnerabilities. 
+
+To automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
+
+### Vulnerabilities vs Reproducibility
 
 Reproducibility is one of the foundations of software development. When you're developing production code, a repeated operation must guarantee the same
 result. Mitigating vulnerabilities can disrupt reproducibility by changing dependencies.
 
 Azure Machine Learning's primary focus is to guarantee reproducibility. Environments fall under three categories: curated,
 user-managed, and system-managed.
 
-**Curated environments** are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned.
+### *Curated Environments*
 
-They contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is.
-These pre-created environments also allow for faster deployment time.
+Curated environments are pre-created environments that Azure Machine Learning manages and are available by default in every Azure Machine Learning workspace provisioned. New versions are released by Azure Machine Learning to address vulnerabilities. Whether you use the latest image may be a tradeoff between reproducibility and vulnerability management. 
+
+Curated Environments contain collections of Python packages and settings to help you get started with various machine learning frameworks. You're meant to use them as is. These pre-created environments also allow for faster deployment time.
+
+### *User-managed Environments*
 
-In **user-managed environments**, you're responsible for setting up your environment and installing every package that your training script needs on the
+In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the
 compute target and for model deployment. These types of environments have two subtypes:
 
 - BYOC (bring your own container): the user provides a Docker image to Azure Machine Learning
 - Docker build context: Azure Machine Learning materializes the image from the user provided content
 
-Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability
-management becomes your responsibility.
+Once you install more dependencies on top of a Microsoft-provided image, or bring your own base image, vulnerability management becomes your responsibility.
 
-You use **system-managed environments** when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
+### *System-managed Environments* 
+
+You use system-managed environments when you want conda to manage the Python environment for you. Azure Machine Learning creates a new isolated conda environment by materializing your conda specification on top of a base Docker image. While Azure Machine Learning patches base images with each release, whether you use the
 latest image may be a tradeoff between reproducibility and vulnerability management. So, it's your responsibility to choose the environment version used
 for your jobs or model deployments while using system-managed environments.
 
+### Vulnerabilities: Common Issues 
+
+### *Vulnerabilities in Base Docker Images* 
+
+System vulnerabilities in an environment are usually introduced from the base image. For example, vulnerabilities marked as "Ubuntu" or "Debian" are from the system level of the environment–the base Docker image. If the base image is from a third-party issuer, please check if the latest version has fixes for the flagged vulnerabilities. Most common sources for the base images in Azure Machine Learning are:
+
+- Microsoft Artifact Registry (MAR) aka Microsoft Container Registry (mcr.microsoft.com). 
+	- Images can be listed from MAR homepage, calling _catalog API, or [/tags/list](https://mcr.microsoft.com/v2/azureml/openmpi4.1.0-ubuntu20.04/tags/list)_
+	- Source and release notes for training base images from AzureML can be found in [Azure/AzureML-Containers](https://github.com/Azure/AzureML-Containers)
+- Nvidia (nvcr.io, or [nvidia's Profile](https://hub.docker.com/u/nvidia/#!))
+
+If the latest version of your base image does not resolve your vulnerabilities, base image vulnerabilities can be addressed by installing versions recommended by a vulnerability scan:
+
+```
+apt-get install -y library_name
+```
+
+### *Vulnerabilities in Python Packages* 
+
+Vulnerabilities can also be from installed Python packages on top of the system-managed base image. These Python-related vulnerabilities should be resolved by updating your Python dependencies. Python (pip) vulnerabilities in the image usually come from user-defined dependencies.
+
+To search for known Python vulnerabilities and solutions please see [GitHub Advisory Database](https://github.com/advisories). To address Python vulnerabilities, update the package to the version that has fixes for the flagged issue:
+
+```
+pip install -u my_package=={good.version}
+```
+
+If you're using a conda environment, update the reference in the conda dependencies file.
+
+In some cases, Python packages will be automatically installed during conda's setup of your environment on top of a base Docker image. Mitigation steps for those are the same as those for user-introduced packages. Conda installs necessary dependencies for every environment it materializes. Packages like cryptography, setuptools, wheel, etc. will be automatically installed from conda's default channels. There's a known issue with the default anaconda channel missing latest package versions, so it's recommended to prioritize the community-maintained conda-forge channel. Otherwise, please explicitly specify packages and versions, even if you don't reference them in the code you plan to execute on that environment.
+
+### *Cache issues* 
+
 Associated to your Azure Machine Learning workspace is an Azure Container Registry instance that's a cache for container images. Any image
 materialized is pushed to the container registry and used if you trigger experimentation or deployment for the corresponding environment. Azure
-Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time. You
-can monitor and maintain environment hygiene with [Microsoft Defender for Container Registry](../defender-for-cloud/defender-for-containers-vulnerability-assessment-azure.md)
-to help scan images for vulnerabilities. To
-automate this process based on triggers from Microsoft Defender, see [Automate responses to Microsoft Defender for Cloud triggers](../defender-for-cloud/workflow-automation.md).
+Machine Learning doesn't delete images from your container registry, and it's your responsibility to evaluate which images you need to maintain over time. 
+
+## Troubleshooting environment image builds
+
+Learn how to troubleshoot issues with environment image builds and package installations.
 
 ## **Environment definition problems**