Skip to content

Commit 154c2c9

Browse files
authored
Merge pull request #98266 from nibaccam/mounting
Data | Mounting clarification
2 parents f070c19 + 58393f2 commit 154c2c9

File tree

4 files changed

+44
-23
lines changed

4 files changed

+44
-23
lines changed

articles/machine-learning/service/how-to-access-data.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,17 @@ ms.custom: seodec18
1919
# Access data in Azure storage services
2020
[!INCLUDE [aml-applies-to-basic-enterprise-sku](../../../includes/aml-applies-to-basic-enterprise-sku.md)]
2121

22-
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization. Using datastores allows you to access your storage without having to hard code connection information in your scripts. You can create datastores from these [Azure storage solutions](#matrix). For unsupported storage solutions, to save data egress cost during machine learning experiments, we recommend you move your data to our supported Azure storage solutions. [Learn how to move your data](#move).
22+
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization. Using datastores allows you to access your storage without having to hard code connection information in your scripts. You can create datastores from these [Azure storage solutions](#matrix). For unsupported storage solutions, and to save data egress cost during machine learning experiments, we recommend you move your data to our supported Azure storage solutions. [Learn how to move your data](#move).
2323

2424
This how-to shows examples of the following tasks:
25-
* [Register datastores](#access)
26-
* [Get datastores from workspace](#get)
27-
* [Upload and download data using datastores](#up-and-down)
28-
* [Access data during training](#train)
29-
* [Move data to Azure](#move)
25+
* Register datastores
26+
* Get datastores from workspace
27+
* Upload and download data using datastores
28+
* Access data during training
29+
* Move data to an Azure storage service
3030

3131
## Prerequisites
32-
32+
You'll need
3333
- An Azure subscription. If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://aka.ms/AMLFree) today.
3434

3535
- An Azure storage account with an [Azure Blob Container](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-overview) or [Azure File Share](https://docs.microsoft.com/azure/storage/files/storage-files-introduction).
@@ -56,7 +56,14 @@ When you register an Azure storage solution as a datastore, you automatically cr
5656

5757
All the register methods are on the [`Datastore`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) class and have the form register_azure_*.
5858

59-
The information you need to populate the register() method can be found via [Azure portal](https://portal.azure.com). Select **Storage Accounts** on the left pane and choose the storage account you want to register. The **Overview** page provides information such as, the account name and container or file share name. For authentication information, like account key or SAS token, navigate to **Account Keys** under the **Settings** pane on the left.
59+
The information you need to populate the register() method can be found via the [Azure Machine Learning studio](https://ml.azure.com) and these steps
60+
61+
1. Select **Storage Accounts** on the left pane and choose the storage account you want to register.
62+
2. The **Overview** page provides information such as, the account name and container or file share name.
63+
3. For authentication information, like account key or SAS token, navigate to **Account Keys** under the **Settings** pane on the left.
64+
65+
>[IMPORTANT]
66+
> If your storage account is in a VNET, only Azure blob datastore creation is supported. Set the parameter, `grant_workspace_access` to `True` to grant your workspace access to your storage account.
6067
6168
The following examples show you to register an Azure Blob Container or an Azure File Share as a datastore.
6269

@@ -72,7 +79,6 @@ The following examples show you to register an Azure Blob Container or an Azure
7279
account_key='your storage account key',
7380
create_if_not_exists=True)
7481
```
75-
If your storage account is in a VNET, only Azure blob datastore creation is supported. Set the parameter, `grant_workspace_access` to `True` to grant your workspace access to your storage account.
7682

7783
+ For an **Azure File Share Datastore**, use [`register_azure_file_share()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py#register-azure-file-share-workspace--datastore-name--file-share-name--account-name--sas-token-none--account-key-none--protocol-none--endpoint-none--overwrite-false--create-if-not-exists-false--skip-validation-false-).
7884

@@ -102,7 +108,7 @@ Create a new datastore in a few steps in Azure Machine Learning studio.
102108

103109
The information you need to populate the form can be found via [Azure portal](https://portal.azure.com). Select **Storage Accounts** on the left pane and choose the storage account you want to register. The **Overview** page provides information such as, the account name and container or file share name. For authentication items, like account key or SAS token, navigate to **Account Keys** under the **Settings** pane on the left.
104110

105-
The following example demonstrates what the form would look like for creating an Azure blob datastore.
111+
The following example demonstrates what the form looks like for Azure blob datastore creation.
106112

107113
![New datastore](media/how-to-access-data/new-datastore-form.png)
108114

@@ -126,7 +132,7 @@ for name, datastore in datastores.items():
126132
print(name, datastore.datastore_type)
127133
```
128134

129-
When you create a workspace, an Azure Blob Container and an Azure File Share are registered to the workspace named `workspaceblobstore` and `workspacefilestore` respectively. They store the connection information of the Blob Container and the File Share that is provisioned in the storage account attached to the workspace. The `workspaceblobstore` is set as the default datastore.
135+
When you create a workspace, an Azure Blob Container and an Azure File Share are automatically registered to the workspace named `workspaceblobstore` and `workspacefilestore` respectively. These store the connection information of the Blob Container and the File Share that is provisioned in the storage account attached to the workspace. The `workspaceblobstore` is set as the default datastore.
130136

131137
To get the workspace's default datastore:
132138

@@ -187,7 +193,7 @@ The following table lists the methods that tell the compute target how to use th
187193

188194
Way|Method|Description|
189195
----|-----|--------
190-
Mount| [`as_mount()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-mount--)| Use to mount the datastore on the compute target.
196+
Mount| [`as_mount()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-mount--)| Use to mount the datastore on the compute target. When mounted, all files of your datastore are made accessible to your compute target.
191197
Download|[`as_download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-download-path-on-compute-none-)|Use to download the contents of your datastore to the location specified by `path_on_compute`. <br><br> This download happens before the run.
192198
Upload|[`as_upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-upload-path-on-compute-none-)| Use to upload a file from the location specified by `path_on_compute` to your datastore. <br><br> This upload happens after your run.
193199

@@ -205,13 +211,14 @@ datastore.path('./bar').as_download()
205211

206212
### Examples
207213

208-
The following code examples are specific to the [`Estimator`](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class for accessing data during training.
214+
The following code examples are specific to the [`Estimator`](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class for accessing data during training.
209215

210216
`script_params` is a dictionary containing parameters to the entry_script. Use it to pass in a datastore and describe how data is made available on the compute target. Learn more from our end-to-end [tutorial](tutorial-train-models-with-aml.md).
211217

212218
```Python
213219
from azureml.train.estimator import Estimator
214220

221+
# notice '/' is in front, this indicates the absolute path
215222
script_params = {
216223
'--data_dir': datastore.path('/bar').as_mount()
217224
}

articles/machine-learning/service/how-to-set-up-training-targets.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Learn more about [submitting experiments](#submit) at the end of this article.
4848

4949
## What's an estimator?
5050

51-
To facilitate model training using popular frameworks, the Azure Machine Learning Python SDK provides an alternative higher-level abstraction, the estimator class. This class allows you to easily construct run configurations. You can create and use a generic [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py) to submit training scripts that use any learning framework you choose (such as scikit-learn).
51+
To facilitate model training using popular frameworks, the Azure Machine Learning Python SDK provides an alternative higher-level abstraction, the estimator class. We recommend using an estimator for training since the class contains methods that allow you to easily construct and customize run configurations. You can create and use a generic [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py) to submit training scripts that use any learning framework you choose (such as scikit-learn). If you need to make your data files available to your compute target, see [Train with Azure Machine Learning datasets](how-to-train-with-datasets.md).
5252

5353
For PyTorch, TensorFlow, and Chainer tasks, Azure Machine Learning also provides respective [PyTorch](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py), [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py), and [Chainer](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py) estimators to simplify using these frameworks.
5454

@@ -359,7 +359,7 @@ For more information, see [Resource management](reference-azure-machine-learning
359359

360360
## Set up with VS Code
361361

362-
You can access, create and manage the compute targets that are associated with your workspace using the [VS Code extension](how-to-vscode-tools.md#create-and-manage-compute-targets) for Azure Machine Learning.
362+
You can access, create, and manage the compute targets that are associated with your workspace using the [VS Code extension](how-to-vscode-tools.md#create-and-manage-compute-targets) for Azure Machine Learning.
363363

364364
## <a id="submit"></a>Submit training run using Azure Machine Learning SDK
365365

articles/machine-learning/service/resource-known-issues.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,20 @@ Binary classification charts (precision-recall, ROC, gain curve etc.) shown in a
8888

8989
These are known issues for Azure Machine Learning Datasets.
9090

91+
### TypeError: FileNotFound: No such file or directory
92+
93+
This error occurs if the file path you provide isn't where the file is located. You need to make sure the way you refer to the file is consistent with where you mounted your dataset on your compute target. To ensure a deterministic state, we recommend using the abstract path when mounting a dataset to a compute target. For example, in the following code we mount the dataset under the root of the filesystem of the compute target, `/tmp`.
94+
95+
```python
96+
# Note the leading / in '/tmp/dataset'
97+
script_params = {
98+
'--data-folder': dset.as_named_input('dogscats_train').as_mount('/tmp/dataset'),
99+
}
100+
```
101+
102+
If you don't include the leading forward slash, '/', you'll need to prefix the working directory e.g.
103+
`/mnt/batch/.../tmp/dataset` on the compute target to indicate where you want the dataset to be mounted.
104+
91105
### Fail to read Parquet file from HTTP or ADLS Gen 2
92106

93107
There is a known issue in AzureML DataPrep SDK version 1.1.25 that causes a failure when creating a dataset by reading Parquet files from HTTP or ADLS Gen 2. It will fail with `Cannot seek once reading started.`. To fix this issue, please upgrade `azureml-dataprep` to a version higher than 1.1.26, or downgrade to a version lower than 1.1.24.
@@ -211,9 +225,9 @@ az aks get-credentials -g <rg> -n <aks cluster name>
211225
Updates to Azure Machine Learning components installed in an Azure Kubernetes Service cluster must be manually applied.
212226

213227
> [!WARNING]
214-
> Before performing the following actions, check the version of your Azure Kubernetes Service cluster. If the cluster version is equal to or greater than 1.14, you will not be able to re-attach your cluster to the Azure Machine Learning workspace.
228+
> Before performing the following actions, check the version of your Azure Kubernetes Service cluster. If the cluster version is equal to or greater than 1.14, you will not be able to reattach your cluster to the Azure Machine Learning workspace.
215229
216-
You can apply these updates by detaching the cluster from the Azure Machine Learning workspace, and then re-attaching the cluster to the workspace. If SSL is enabled in the cluster, you will need to supply the SSL certificate and private key when re-attaching the cluster.
230+
You can apply these updates by detaching the cluster from the Azure Machine Learning workspace, and then reattaching the cluster to the workspace. If SSL is enabled in the cluster, you will need to supply the SSL certificate and private key when reattaching the cluster.
217231

218232
```python
219233
compute_target = ComputeTarget(workspace=ws, name=clusterWorkspaceName)
@@ -251,19 +265,19 @@ If you are running into ModuleErrors while submitting experiments in Azure ML, i
251265

252266
If you are using [Estimators](concept-azure-machine-learning-architecture.md#estimators) to submit experiments, you can specify a package name via `pip_packages` or `conda_packages` parameter in the estimator based on from which source you want to install the package. You can also specify a yml file with all your dependencies using `conda_dependencies_file`or list all your pip requirements in a txt file using `pip_requirements_file` parameter.
253267

254-
Azure ML also provides framework specific estimators for Tensorflow, PyTorch, Chainer and SKLearn. Using these estimators will make sure that the framework dependencies are installed on your behalf in the environment used for training. You have the option to specify extra dependencies as described above.
268+
Azure ML also provides framework-specific estimators for Tensorflow, PyTorch, Chainer and SKLearn. Using these estimators will make sure that the framework dependencies are installed on your behalf in the environment used for training. You have the option to specify extra dependencies as described above.
255269

256270
Azure ML maintained docker images and their contents can be seen in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
257-
Framework specific dependencies are listed in the respective framework documentation - [Chainer](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py#remarks), [PyTorch](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py#remarks), [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py#remarks), [SKLearn](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py#remarks).
271+
Framework-specific dependencies are listed in the respective framework documentation - [Chainer](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py#remarks), [PyTorch](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.pytorch?view=azure-ml-py#remarks), [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py#remarks), [SKLearn](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py#remarks).
258272

259273
>[Note!]
260274
> If you think a particular package is common enough to be added in Azure ML maintained images and environments please raise a GitHub issue in [AzureML Containers](https://github.com/Azure/AzureML-Containers).
261275
262276
### NameError (Name not defined), AttributeError (Object has no attribute)
263277
This exception should come from your training scripts. You can look at the log files from Azure portal to get more information about the specific name not defined or attribute error. From the SDK, you can use `run.get_details()` to look at the error message. This will also list all the log files generated for your run. Please make sure to take a look at your training script, fix the error before retrying.
264278

265-
### Horovod is shutdown
266-
In most cases, this exception means there was an underlying exception in one of the processes that caused horovod to shutdown. Each rank in the MPI job gets it own dedicated log file in Azure ML. These logs are named `70_driver_logs`. In case of distributed training, the log names are suffixed with `_rank` to make it easy to differentiate the logs. To find the exact error that caused horovod shutdown, go through all the log files and look for `Traceback` at the end of the driver_log files. One of these files will give you the actual underlying exception.
279+
### Horovod is shut down
280+
In most cases, this exception means there was an underlying exception in one of the processes that caused horovod to shut down. Each rank in the MPI job gets it own dedicated log file in Azure ML. These logs are named `70_driver_logs`. In case of distributed training, the log names are suffixed with `_rank` to make it easy to differentiate the logs. To find the exact error that caused horovod shutdown, go through all the log files and look for `Traceback` at the end of the driver_log files. One of these files will give you the actual underlying exception.
267281

268282
## Labeling projects issues
269283

@@ -281,6 +295,6 @@ Manually refresh the page. Initialization should proceed at roughly 20 datapoint
281295

282296
To load all labeled images, choose the **First** button. The **First** button will take you back to the front of the list, but loads all labeled data.
283297

284-
### Pressing Esc key while labeling for object detection creates a zero size label on the top left corner. Submitting labels in this state fails.
298+
### Pressing Esc key while labeling for object detection creates a zero size label on the top-left corner. Submitting labels in this state fails.
285299

286300
Delete the label by clicking on the cross mark next to it.

articles/machine-learning/service/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@
153153
- name: Work with data
154154
items:
155155
- name: Get data from a datastore
156-
displayName: blob, get, fileshare, access storage
156+
displayName: blob, get, fileshare, access storage, mount, download
157157
href: how-to-access-data.md
158158
- name: Add & register datasets
159159
displayName: data, dataset

0 commit comments

Comments
 (0)