Skip to content

Commit dcac92b

Browse files
author
Larry O'Brien
committed
Added notes and crosslinks on environment caching
1 parent e05c51b commit dcac92b

File tree

3 files changed

+12
-6
lines changed

3 files changed

+12
-6
lines changed

articles/machine-learning/concept-environments.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: core
88
ms.topic: conceptual
99
ms.author: trbye
1010
author: trevorbye
11-
ms.date: 01/06/2020
11+
ms.date: 03/18/2020
1212
---
1313

1414
# What are Azure Machine Learning environments?
@@ -87,7 +87,10 @@ See the following diagram that shows three environment definitions. Two of them
8787

8888
![Diagram of environment caching as Docker images](./media/concept-environments/environment-caching.png)
8989

90-
If you create an environment with unpinned package dependency, for example ```numpy```, that environment will keep using the package version installed at the time of environment creation. Also, any future environment with matching definition will keep using the old version. To update the package, specify a version number to force image rebuild, for example ```numpy==1.18.1```. Note that new dependencies, including nested ones will be installed that might break a previously working scenario
90+
>[!IMPORTANT]
91+
> If you create an environment with an unpinned package dependency, for example ```numpy```, that environment will keep using the package version installed _at the time of environment creation_. Also, any future environment with matching definition will keep using the old version.
92+
93+
To update the package, specify a version number to force image rebuild, for example ```numpy==1.18.1```. Note that new dependencies, including nested ones will be installed that might break a previously working scenario.
9194

9295
> [!WARNING]
9396
> The [Environment.build](https://docs.microsoft.com/python/api/azureml-core/azureml.core.environment.environment?view=azure-ml-py#build-workspace-) method will rebuild the cached image, with possible side-effect of updating unpinned packages and breaking reproducibility for all environment definitions corresponding to that cached image.

articles/machine-learning/how-to-debug-pipelines.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: core
88
ms.topic: conceptual
99
author: likebupt
1010
ms.author: keli19
11-
ms.date: 12/12/2019
11+
ms.date: 03/18/2020
1212
---
1313

1414
# Debug and troubleshoot machine learning pipelines
@@ -28,7 +28,7 @@ The following sections provide an overview of common pitfalls when building pipe
2828

2929
One of the most common failures in a pipeline is that an attached script (data cleansing script, scoring script, etc.) is not running as intended, or contains runtime errors in the remote compute context that are difficult to debug in your workspace in the Azure Machine Learning studio.
3030

31-
Pipelines themselves cannot be run locally, but running the scripts in isolation on your local machine allows you to debug faster because you dont have to wait for the compute and environment build process. Some development work is required to do this:
31+
Pipelines themselves cannot be run locally, but running the scripts in isolation on your local machine allows you to debug faster because you don't have to wait for the compute and environment build process. Some development work is required to do this:
3232

3333
* If your data is in a cloud datastore, you will need to download data and make it available to your script. Using a small sample of your data is a good way to cut down on runtime and quickly get feedback on script behavior
3434
* If you are attempting to simulate an intermediate pipeline step, you may need to manually build the object types that the particular script is expecting from the prior step
@@ -76,7 +76,7 @@ The following table contains common problems during pipeline development, with p
7676
| Problem | Possible solution |
7777
|--|--|
7878
| Unable to pass data to `PipelineData` directory | Ensure you have created a directory in the script that corresponds to where your pipeline expects the step output data. In most cases, an input argument will define the output directory, and then you create the directory explicitly. Use `os.makedirs(args.output_dir, exist_ok=True)` to create the output directory. See the [tutorial](tutorial-pipeline-batch-scoring-classification.md#write-a-scoring-script) for a scoring script example that shows this design pattern. |
79-
| Dependency bugs | If you have developed and tested scripts locally but find dependency issues when running on a remote compute in the pipeline, ensure your compute environment dependencies and versions match your test environment. |
79+
| Dependency bugs | If you have developed and tested scripts locally but find dependency issues when running on a remote compute in the pipeline, ensure your compute environment dependencies and versions match your test environment. (See [Environment building, caching, and reuse](https://docs.microsoft.com/azure/machine-learning/concept-environments#environment-building-caching-and-reuse)|
8080
| Ambiguous errors with compute targets | Deleting and re-creating compute targets can solve certain issues with compute targets. |
8181
| Pipeline not reusing steps | Step reuse is enabled by default, but ensure you haven't disabled it in a pipeline step. If reuse is disabled, the `allow_reuse` parameter in the step will be set to `False`. |
8282
| Pipeline is rerunning unnecessarily | To ensure that steps only rerun when their underlying data or scripts change, decouple your directories for each step. If you use the same source directory for multiple steps, you may experience unnecessary reruns. Use the `source_directory` parameter on a pipeline step object to point to your isolated directory for that step, and ensure you aren't using the same `source_directory` path for multiple steps. |

articles/machine-learning/how-to-use-environments.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.reviewer: nibaccam
99
ms.service: machine-learning
1010
ms.subservice: core
1111
ms.topic: conceptual
12-
ms.date: 02/27/2020
12+
ms.date: 03/18/2020
1313

1414
## As a developer, I need to configure my experiment context with the necessary software packages so my machine learning models can be trained and deployed on different compute targets.
1515

@@ -155,6 +155,9 @@ conda_dep.add_conda_package("scikit-learn==0.21.3")
155155
myenv.python.conda_dependencies=conda_dep
156156
```
157157

158+
>[!IMPORTANT]
159+
> If you use the same environment definition for another run, the Azure Machine Learning service reuses the cached image of your environment. If you create an environment with an unpinned package dependency, for example ```numpy```, that environment will keep using the package version installed _at the time of environment creation_. Also, any future environment with matching definition will keep using the old version. For more information, see [Environment building, caching, and reuse](https://docs.microsoft.com/azure/machine-learning/concept-environments#environment-building-caching-and-reuse).
160+
158161
### Private wheel files
159162

160163
You can use private pip wheel files by first uploading them to your workspace storage. You upload them by using a static [`add_private_pip_wheel()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.environment.environment?view=azure-ml-py#add-private-pip-wheel-workspace--file-path--exist-ok-false-) method. Then you capture the storage URL and pass the URL to the `add_pip_package()` method.

0 commit comments

Comments
 (0)