Skip to content

Commit 20cac27

Browse files
authored
Merge pull request #177604 from vizhur/patch-6
Update concept-environments.md
2 parents ad1abd0 + ba199c0 commit 20cac27

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

articles/machine-learning/concept-environments.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -33,42 +33,41 @@ Environments can broadly be divided into three categories: *curated*, *user-mana
3333

3434
Curated environments are provided by Azure Machine Learning and are available in your workspace by default. Intended to be used as is, they contain collections of Python packages and settings to help you get started with various machine learning frameworks. These pre-created environments also allow for faster deployment time. For a full list, see the [curated environments article](resource-curated-environments.md).
3535

36-
In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the compute target. Conda doesn't check your environment or install anything for you. If you're defining your own environment, you must list `azureml-defaults` with version `>= 1.0.45` as a pip dependency. This package contains the functionality that's needed to host the model as a web service.
36+
In user-managed environments, you're responsible for setting up your environment and installing every package that your training script needs on the compute target. Also be sure to include any dependencies needed for model deployment.
3737

38-
You use system-managed environments when you want [Conda](https://conda.io/docs/) to manage the Python environment and the script dependencies for you. A new conda environment is built based on the conda dependencies object. The Azure Machine Learning service assumes this type of environment by default, because of its usefulness on remote compute targets that aren't manually configurable.
38+
You use system-managed environments when you want [conda](https://conda.io/docs/) to manage the Python environment for you. A new conda environment is materialized from your conda specification on top of a base docker image.
3939

4040
## Create and manage environments
4141

42-
You can create environments by:
42+
You can create environments from clients like the AzureML Python SDK, Azure Machine Learning CLI, Environments page in Azure Machine Learning studio, and [VS Code extension](how-to-manage-resources-vscode.md#create-environment). Every client allows you to customize the base image, Dockerfile, and Python layer if needed.
4343

44-
* Defining new `Environment` objects, either by using a curated environment or by defining your own dependencies.
45-
* Using existing `Environment` objects from your workspace. This approach allows for consistency and reproducibility with your dependencies.
46-
* Importing from an existing Anaconda environment definition.
47-
* Using the Azure Machine Learning CLI
48-
* [Using the VS Code extension](how-to-manage-resources-vscode.md#create-environment)
44+
For specific code samples, see the "Create an environment" section of [How to use environments](how-to-use-environments.md#create-an-environment).
4945

50-
For specific code samples, see the "Create an environment" section of [How to use environments](how-to-use-environments.md#create-an-environment). Environments are also easily managed through your workspace. They include the following functionality:
46+
Environments are also easily managed through your workspace, which allows you to:
5147

52-
* Environments are automatically registered to your workspace when you submit an experiment. They can also be manually registered.
53-
* You can fetch environments from your workspace to use for training or deployment, or to make edits to the environment definition.
54-
* With versioning, you can see changes to your environments over time, which ensures reproducibility.
55-
* You can build Docker images automatically from your environments.
48+
* Register environments.
49+
* Fetch environments from your workspace to use for training or deployment.
50+
* Create a new instance of an environment by editing an existing one.
51+
* View changes to your environments over time, which ensures reproducibility.
52+
* Build Docker images automatically from your environments.
53+
54+
"Anonymous" environments are automatically registered in your workspace when you submit an experiment. They will not be listed but may be retrieved by version.
5655

5756
For code samples, see the "Manage environments" section of [How to use environments](how-to-use-environments.md#manage-environments).
5857

5958
## Environment building, caching, and reuse
6059

61-
The Azure Machine Learning service builds environment definitions into Docker images and conda environments. It also caches the environments so they can be reused in subsequent training runs and service endpoint deployments. Running a training script remotely requires the creation of a Docker image whereas, a local run can use a Conda environment directly.
60+
Azure Machine Learning builds environment definitions into Docker images and conda environments. It also caches the environments so they can be reused in subsequent training runs and service endpoint deployments. Running a training script remotely requires the creation of a Docker image, but a local run can use a conda environment directly.
6261

6362
### Submitting a run using an environment
6463

6564
When you first submit a remote run using an environment, the Azure Machine Learning service invokes an [ACR Build Task](../container-registry/container-registry-tasks-overview.md) on the Azure Container Registry (ACR) associated with the Workspace. The built Docker image is then cached on the Workspace ACR. Curated environments are backed by Docker images that are cached in Global ACR. At the start of the run execution, the image is retrieved by the compute target from the relevant ACR.
6665

67-
For local runs, a Docker or Conda environment is created based on the environment definition. The scripts are then executed on the target compute - a local runtime environment or local Docker engine.
66+
For local runs, a Docker or conda environment is created based on the environment definition. The scripts are then executed on the target compute - a local runtime environment or local Docker engine.
6867

6968
### Building environments as Docker images
7069

71-
If the environment definition doesn't already exist in the workspace ACR, a new image will be built. The image build consists of two steps:
70+
If the image for a particular environment definition doesn't already exist in the workspace ACR, a new image will be built. The image build consists of two steps:
7271

7372
1. Downloading a base image, and executing any Docker steps
7473
2. Building a conda environment according to conda dependencies specified in the environment definition.
@@ -77,34 +76,39 @@ The second step is omitted if you specify [user-managed dependencies](/python/ap
7776

7877
### Image caching and reuse
7978

80-
If you use the same environment definition for another run, the Azure Machine Learning service reuses the cached image from the Workspace ACR.
79+
If you use the same environment definition for another run, Azure Machine Learning reuses the cached image from the Workspace ACR to save time.
8180

82-
To view the details of a cached image, use [Environment.get_image_details](/python/api/azureml-core/azureml.core.environment.environment#get-image-details-workspace-) method.
81+
To view the details of a cached image, check the Environments page in Azure Machine Learning studio or use the [`Environment.get_image_details`](/python/api/azureml-core/azureml.core.environment.environment#get-image-details-workspace-) method.
8382

84-
To determine whether to reuse a cached image or build a new one, the service computes [a hash value](https://en.wikipedia.org/wiki/Hash_table) from the environment definition and compares it to the hashes of existing environments. The hash is based on:
83+
To determine whether to reuse a cached image or build a new one, AzureML computes a [hash value](https://en.wikipedia.org/wiki/Hash_table) from the environment definition and compares it to the hashes of existing environments. The hash is based on the environment definition's:
8584

86-
* Base image property value
87-
* Custom docker steps property value
88-
* List of Python packages in Conda definition
89-
* List of packages in Spark definition
85+
* Base image
86+
* Custom docker steps
87+
* Python packages
88+
* Spark packages
89+
90+
The hash isn't affected by the environment name or version. If you rename your environment or create a new one with the same settings and packages as another environment, then the hash value will remain the same. However, environment definition changes like adding or removing a Python package or changing a package version will result cause the resulting hash value to change. Changing the order of dependencies or channels in an environment will also change the hash and require a new image build. Similarly, any change to a curated environment will result in the creation of a new "non-curated" environment.
91+
92+
> [!NOTE]
93+
> You will not be able to submit any local changes to a curated environment without changing the name of the environment. The prefixes "AzureML-" and "Microsoft" are reserved exclusively for curated environments, and your job submission will fail if the name starts with either of them.
9094
91-
The hash doesn't depend on environment name or version - if you rename your environment or create a new environment with the exact properties and packages of an existing one, then the hash value remains the same. However, environment definition changes, such as adding or removing a Python package or changing the package version, cause the hash value to change. Changing the order of dependencies or channels in an environment will result in a new environment and thus require a new image build. It is important to note that any change to a curated environment will invalidate the hash and result in a new "non-curated" environment.
95+
The environment's computed hash value is compared with those in the Workspace and global ACR, or on the compute target (local runs only). If there is a match then the cached image is pulled and used, otherwise an image build is triggered.
9296

93-
The computed hash value is compared to those in the Workspace and Global ACR (or on the compute target for local runs). If there is a match then the cached image is pulled, otherwise an image build is triggered. The duration to pull a cached image includes the download time whereas the duration to pull a newly built image includes both the build time and the download time.
97+
The following diagram shows three environment definitions. Two of them have different names and versions but identical base images and Python packages, which results in the same hash and corresponding cached image. The third environment has different Python packages and versions, leading to a different hash and cached image.
9498

95-
The following diagram shows three environment definitions. Two of them have different names and versions, but identical base image and Python packages. But they have the same hash and thus correspond to the same cached image. The third environment has different Python packages and versions, and therefore corresponds to a different cached image.
99+
![Diagram of environment caching and Docker images](./media/concept-environments/environment-caching.png)
96100

97-
![Diagram of environment caching as Docker images](./media/concept-environments/environment-caching.png)
101+
Actual cached images in your workspace ACR will have names like `azureml/azureml_e9607b2514b066c851012848913ba19f` with the hash appearing at the end.
98102

99103
>[!IMPORTANT]
100-
> * If you create an environment with an unpinned package dependency, for example, `numpy`, the environment uses the package version that was *installed when the environment was created*. Also, any future environment that uses a matching definition will use the original version.
104+
> * If you create an environment with an unpinned package dependency (for example, `numpy`), the environment uses the package version that was *available when the environment was created*. Any future environment that uses a matching definition will use the original version.
101105
>
102-
> To update the package, specify a version number to force image rebuild, for example, `numpy==1.18.1`. New dependencies, including nested ones, will be installed, and they might break a previously working scenario.
106+
> To update the package, specify a version number to force an image rebuild. An example of this would be changing `numpy` to `numpy==1.18.1`. New dependencies--including nested ones--will be installed, and they might break a previously working scenario.
103107
>
104-
> * Using an unpinned base image like `mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04` in your environment definition results in rebuilding the environment every time the latest tag is updated. It's assumed that you want to keep up to date with the latest version for various reasons, like for vulnerabilities, system updates, and patches.
108+
> * Using an unpinned base image like `mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04` in your environment definition results in rebuilding the image every time the `latest` tag is updated. This helps the image receive the latest patches and system updates.
105109
106110
> [!WARNING]
107-
> The [Environment.build](/python/api/azureml-core/azureml.core.environment.environment#build-workspace--image-build-compute-none-) method will rebuild the cached image, with possible side-effect of updating unpinned packages and breaking reproducibility for all environment definitions corresponding to that cached image.
111+
> The [`Environment.build`](/python/api/azureml-core/azureml.core.environment.environment#build-workspace--image-build-compute-none-) method will rebuild the cached image, with the possible side-effect of updating unpinned packages and breaking reproducibility for all environment definitions corresponding to that cached image.
108112
109113
## Next steps
110114

0 commit comments

Comments
 (0)