Skip to content

Commit 69bf7a8

Browse files
committed
move to v1 folder
1 parent d3e037a commit 69bf7a8

9 files changed

+190
-7
lines changed

articles/machine-learning/.openpublishing.redirection.machine-learning.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
{
22
"redirections": [
3+
{
4+
"source_path_from_root": "/articles/machine-learning/how-to-train-with-custom-image.md",
5+
"redirect_url": "/azure/machine-learning/v1/how-to-train-with-custom-image",
6+
"redirect_document_id": true
7+
},
38
{
49
"source_path_from_root": "/articles/machine-learning/how-to-monitor-tensorboard.md",
510
"redirect_url": "/azure/machine-learning/v1/how-to-monitor-tensorboard",

articles/machine-learning/concept-environments.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,6 @@ If you provide your own images, you are responsible for updating them.
117117
For more information on the base images, see the following links:
118118

119119
* [Azure Machine Learning base images](https://github.com/Azure/AzureML-Containers) GitHub repository.
120-
* [Train a model using a custom image](how-to-train-with-custom-image.md).
121120
* [Deploy a TensorFlow model using a custom container](how-to-deploy-custom-container.md)
122121

123122
## Next steps

articles/machine-learning/how-to-secure-workspace-vnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ When your Azure Machine Learning workspace is configured with a private endpoint
8787
When ACR is behind a virtual network, Azure Machine Learning can’t use it to directly build Docker images. Instead, the compute cluster is used to build the images.
8888

8989
> [!IMPORTANT]
90-
> The compute cluster used to build Docker images needs to be able to access the package repositories that are used to train and deploy your models. You may need to add network security rules that allow access to public repos, [use private Python packages](how-to-use-private-python-packages.md), or use [custom Docker images](how-to-train-with-custom-image.md) that already include the packages.
90+
> The compute cluster used to build Docker images needs to be able to access the package repositories that are used to train and deploy your models. You may need to add network security rules that allow access to public repos, [use private Python packages](how-to-use-private-python-packages.md), or use [custom Docker images](v1/how-to-train-with-custom-image.md) that already include the packages.
9191
9292
> [!WARNING]
9393
> If your Azure Container Registry uses a private endpoint or service endpoint to communicate with the virtual network, you cannot use a managed identity with an Azure Machine Learning compute cluster.

articles/machine-learning/v1/how-to-migrate-from-estimators-to-scriptrunconfig.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ myenv.environment_variables = {"MESSAGE":"Hello from Azure Machine Learning"}
7373
For information on configuring and managing Azure ML environments, see:
7474
* [How to use environments](how-to-use-environments.md)
7575
* [Curated environments](../resource-curated-environments.md)
76-
* [Train with a custom Docker image](../how-to-train-with-custom-image.md)
76+
* [Train with a custom Docker image](how-to-train-with-custom-image.md)
7777

7878
## Using data for training
7979
### Datasets

articles/machine-learning/v1/how-to-secure-workspace-vnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ When your Azure Machine Learning workspace is configured with a private endpoint
8888
When ACR is behind a virtual network, Azure Machine Learning can’t use it to directly build Docker images. Instead, the compute cluster is used to build the images.
8989

9090
> [!IMPORTANT]
91-
> The compute cluster used to build Docker images needs to be able to access the package repositories that are used to train and deploy your models. You may need to add network security rules that allow access to public repos, [use private Python packages](how-to-use-private-python-packages.md), or use [custom Docker images](../how-to-train-with-custom-image.md) that already include the packages.
91+
> The compute cluster used to build Docker images needs to be able to access the package repositories that are used to train and deploy your models. You may need to add network security rules that allow access to public repos, [use private Python packages](how-to-use-private-python-packages.md), or use [custom Docker images](how-to-train-with-custom-image.md) that already include the packages.
9292
9393
> [!WARNING]
9494
> If your Azure Container Registry uses a private endpoint or service endpoint to communicate with the virtual network, you cannot use a managed identity with an Azure Machine Learning compute cluster.

articles/machine-learning/v1/how-to-train-pytorch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ pytorch_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1
190190
```
191191

192192
> [!TIP]
193-
> Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your environment from that. For more information, see [Train with custom image](../how-to-train-with-custom-image.md).
193+
> Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your environment from that. For more information, see [Train with custom image](how-to-train-with-custom-image.md).
194194

195195
For more information on creating and using environments, see [Create and use software environments in Azure Machine Learning](how-to-use-environments.md).
196196

articles/machine-learning/v1/how-to-train-tensorflow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ tf_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudn
195195
```
196196

197197
> [!TIP]
198-
> Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your environment from that. For more information, see [Train with custom image](../how-to-train-with-custom-image.md).
198+
> Optionally, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your environment from that. For more information, see [Train with custom image](how-to-train-with-custom-image.md).
199199

200200
For more information on creating and using environments, see [Create and use software environments in Azure Machine Learning](how-to-use-environments.md).
201201

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
title: Train a model by using a custom Docker image
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how to use your own Docker images, or curated ones from Microsoft, to train models in Azure Machine Learning.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: core
8+
ms.author: sagopal
9+
author: saachigopal
10+
ms.reviewer: ssalgado
11+
ms.date: 08/11/2021
12+
ms.topic: how-to
13+
ms.custom: sdkv1, event-tier1-build-2022
14+
---
15+
16+
# Train a model by using a custom Docker image
17+
18+
[!INCLUDE [sdk v1](../../../includes/machine-learning-sdk-v1.md)]
19+
20+
In this article, learn how to use a custom Docker image when you're training models with Azure Machine Learning. You'll use the example scripts in this article to classify pet images by creating a convolutional neural network.
21+
22+
Azure Machine Learning provides a default Docker base image. You can also use Azure Machine Learning environments to specify a different base image, such as one of the maintained [Azure Machine Learning base images](https://github.com/Azure/AzureML-Containers) or your own [custom image](../how-to-deploy-custom-container.md). Custom base images allow you to closely manage your dependencies and maintain tighter control over component versions when running training jobs.
23+
24+
## Prerequisites
25+
26+
Run the code on either of these environments:
27+
28+
* Azure Machine Learning compute instance (no downloads or installation necessary):
29+
* Complete the [Quickstart: Get started with Azure Machine Learning](../quickstart-create-resources.md) tutorial to create a dedicated notebook server preloaded with the SDK and the sample repository.
30+
* Your own Jupyter Notebook server:
31+
* Create a [workspace configuration file](../how-to-configure-environment.md#local-and-dsvm-only-create-a-workspace-configuration-file).
32+
* Install the [Azure Machine Learning SDK](/python/api/overview/azure/ml/install).
33+
* Create an [Azure container registry](container-registry/index.yml) or other Docker registry that's available on the internet.
34+
35+
## Set up a training experiment
36+
37+
In this section, you set up your training experiment by initializing a workspace, defining your environment, and configuring a compute target.
38+
39+
### Initialize a workspace
40+
41+
The [Azure Machine Learning workspace](../concept-workspace.md) is the top-level resource for the service. It gives you a centralized place to work with all the artifacts that you create. In the Python SDK, you can access the workspace artifacts by creating a [`Workspace`](/python/api/azureml-core/azureml.core.workspace.workspace) object.
42+
43+
Create a `Workspace` object from the config.json file that you created as a [prerequisite](#prerequisites).
44+
45+
```Python
46+
from azureml.core import Workspace
47+
48+
ws = Workspace.from_config()
49+
```
50+
51+
### Define your environment
52+
53+
Create an `Environment` object.
54+
55+
```python
56+
from azureml.core import Environment
57+
58+
fastai_env = Environment("fastai2")
59+
```
60+
61+
The specified base image in the following code supports the fast.ai library, which allows for distributed deep-learning capabilities. For more information, see the [fast.ai Docker Hub repository](https://hub.docker.com/u/fastdotai).
62+
63+
When you're using your custom Docker image, you might already have your Python environment properly set up. In that case, set the `user_managed_dependencies` flag to `True` to use your custom image's built-in Python environment. By default, Azure Machine Learning builds a Conda environment with dependencies that you specified. The service runs the script in that environment instead of using any Python libraries that you installed on the base image.
64+
65+
```python
66+
fastai_env.docker.base_image = "fastdotai/fastai2:latest"
67+
fastai_env.python.user_managed_dependencies = True
68+
```
69+
70+
#### Use a private container registry (optional)
71+
72+
To use an image from a private container registry that isn't in your workspace, use `docker.base_image_registry` to specify the address of the repository and a username and password:
73+
74+
```python
75+
# Set the container registry information.
76+
fastai_env.docker.base_image_registry.address = "myregistry.azurecr.io"
77+
fastai_env.docker.base_image_registry.username = "username"
78+
fastai_env.docker.base_image_registry.password = "password"
79+
```
80+
81+
#### Use a custom Dockerfile (optional)
82+
83+
It's also possible to use a custom Dockerfile. Use this approach if you need to install non-Python packages as dependencies. Remember to set the base image to `None`.
84+
85+
```python
86+
# Specify Docker steps as a string.
87+
dockerfile = r"""
88+
FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210615.v1
89+
RUN echo "Hello from custom container!"
90+
"""
91+
92+
# Set the base image to None, because the image is defined by Dockerfile.
93+
fastai_env.docker.base_image = None
94+
fastai_env.docker.base_dockerfile = dockerfile
95+
96+
# Alternatively, load the string from a file.
97+
fastai_env.docker.base_image = None
98+
fastai_env.docker.base_dockerfile = "./Dockerfile"
99+
```
100+
101+
>[!IMPORTANT]
102+
> Azure Machine Learning only supports Docker images that provide the following software:
103+
> * Ubuntu 18.04 or greater.
104+
> * Conda 4.7.# or greater.
105+
> * Python 3.7+.
106+
> * A POSIX compliant shell available at /bin/sh is required in any container image used for training.
107+
108+
For more information about creating and managing Azure Machine Learning environments, see [Create and use software environments](../how-to-use-environments.md).
109+
110+
### Create or attach a compute target
111+
112+
You need to create a [compute target](concept-azure-machine-learning-architecture.md#compute-targets) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.
113+
114+
Creation of `AmlCompute` takes a few minutes. If the `AmlCompute` resource is already in your workspace, this code skips the creation process.
115+
116+
As with other Azure services, there are limits on certain resources (for example, `AmlCompute`) associated with the Azure Machine Learning service. For more information, see [Default limits and how to request a higher quota](how-to-manage-quotas.md).
117+
118+
```python
119+
from azureml.core.compute import ComputeTarget, AmlCompute
120+
from azureml.core.compute_target import ComputeTargetException
121+
122+
# Choose a name for your cluster.
123+
cluster_name = "gpu-cluster"
124+
125+
try:
126+
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
127+
print('Found existing compute target.')
128+
except ComputeTargetException:
129+
print('Creating a new compute target...')
130+
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
131+
max_nodes=4)
132+
133+
# Create the cluster.
134+
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
135+
136+
compute_target.wait_for_completion(show_output=True)
137+
138+
# Use get_status() to get a detailed status for the current AmlCompute.
139+
print(compute_target.get_status().serialize())
140+
```
141+
142+
143+
>[!IMPORTANT]
144+
>Use CPU SKUs for any image build on compute.
145+
146+
147+
## Configure your training job
148+
149+
For this tutorial, use the training script *train.py* on [GitHub](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/workflows/train/fastai/pets/src/train.py). In practice, you can take any custom training script and run it, as is, with Azure Machine Learning.
150+
151+
Create a `ScriptRunConfig` resource to configure your job for running on the desired [compute target](how-to-set-up-training-targets.md).
152+
153+
```python
154+
from azureml.core import ScriptRunConfig
155+
156+
src = ScriptRunConfig(source_directory='fastai-example',
157+
script='train.py',
158+
compute_target=compute_target,
159+
environment=fastai_env)
160+
```
161+
162+
## Submit your training job
163+
164+
When you submit a training run by using a `ScriptRunConfig` object, the `submit` method returns an object of type `ScriptRun`. The returned `ScriptRun` object gives you programmatic access to information about the training run.
165+
166+
```python
167+
from azureml.core import Experiment
168+
169+
run = Experiment(ws,'Tutorial-fastai').submit(src)
170+
run.wait_for_completion(show_output=True)
171+
```
172+
173+
> [!WARNING]
174+
> Azure Machine Learning runs training scripts by copying the entire source directory. If you have sensitive data that you don't want to upload, use an [.ignore file](../concept-train-machine-learning-model.md#understand-what-happens-when-you-submit-a-training-job) or don't include it in the source directory. Instead, access your data by using a [datastore](/python/api/azureml-core/azureml.data).
175+
176+
## Next steps
177+
In this article, you trained a model by using a custom Docker image. See these other articles to learn more about Azure Machine Learning:
178+
* [Track run metrics](../how-to-log-view-metrics.md) during training.
179+
* [Deploy a model](../how-to-deploy-custom-container.md) by using a custom Docker image.

articles/machine-learning/v1/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@
241241
- name: PyTorch
242242
href: how-to-train-pytorch.md
243243
- name: Train with custom Docker image
244-
href: ../how-to-train-with-custom-image.md
244+
href: how-to-train-with-custom-image.md
245245
- name: Migrate from Estimators to ScriptRunConfig
246246
href: how-to-migrate-from-estimators-to-scriptrunconfig.md
247247
- name: Use Key Vault when training

0 commit comments

Comments
 (0)