You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/service/how-to-access-data.md
+28-27Lines changed: 28 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,18 +6,19 @@ services: machine-learning
6
6
ms.service: machine-learning
7
7
ms.subservice: core
8
8
ms.topic: conceptual
9
-
ms.author: minxia
10
-
author: mx-iao
11
-
ms.reviewer: sgilley
12
-
ms.date: 05/24/2019
9
+
ms.author: sihhu
10
+
author: MayMSFT
11
+
ms.reviewer: nibaccam
12
+
ms.date: 08/2/2019
13
13
ms.custom: seodec18
14
14
15
+
# Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
15
16
16
17
---
17
18
18
19
# Access data in Azure storage services
19
20
20
-
In Azure Machine Learning service, we make it easy to access data in Azure storage services via datastores. Datastores are used to store connection information and secret to access your storage. You can use datastores to access your storage services during training instead of hard coding the connection information and secret in your script.
21
+
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization, to access your storage without having to hard code that information in your scripts.
21
22
22
23
This how-to shows examples of the following tasks:
23
24
*[Register datastores](#access)
@@ -42,8 +43,6 @@ ws = Workspace.from_config()
42
43
43
44
## Register datastores
44
45
45
-
Datastores currently support storing connection information to the following storage services: Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System.
46
-
47
46
All the register methods are on the [`Datastore`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) class and have the form register_azure_*.
48
47
49
48
The following examples show you to register an Azure Blob Container or an Azure File Share as a datastore.
@@ -70,6 +69,7 @@ The following examples show you to register an Azure Blob Container or an Azure
70
69
```
71
70
72
71
#### Storage guidance
72
+
73
73
We recommend Azure Blob Container. Both standard and premium storage are available for blobs. Although more expensive, we suggest premium storage due to faster throughput speeds that may improve the speed of your training runs, particularly if you train against a large data set. See the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service) for storage account cost information.
74
74
75
75
<aname="get"></a>
@@ -114,7 +114,7 @@ The [`upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data
114
114
115
115
Upload either a directory or individual files to the datastore using the Python SDK.
`target_path` specifies the location in the file share (or blob container) to upload. It defaults to `None`, in which case the data gets uploaded to root. `overwrite=True`will overwrite any existing data at `target_path`.
129
+
The `target_path`parameter specifies the location in the file share (or blob container) to upload. It defaults to `None`, in which case the data gets uploaded to root. When `overwrite=True` any existing data at `target_path` is overwritten.
130
130
131
-
Or upload a list of individual files to the datastore via the datastore's `upload_files()` method.
131
+
Or upload a list of individual files to the datastore via the `upload_files()` method.
132
132
133
133
### Download
134
+
134
135
Similarly, download data from a datastore to your local file system.
`target_path` is the location of the local directory to download the data to. To specify a path to the folder in the file share (or blob container) to download, provide that path to `prefix`. If `prefix` is `None`, all the contents of your file share (or blob container) will get downloaded.
143
+
The `target_path` parameter is the location of the local directory to download the data to. To specify a path to the folder in the file share (or blob container) to download, provide that path to `prefix`. If `prefix` is `None`, all the contents of your file share (or blob container) will get downloaded.
Download|[`as_download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-download-path-on-compute-none-)|Use to download the contents of your datastore to the location specified by `path_on_compute`. <br> This download happens before the run.
155
156
Upload|[`as_upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-upload-path-on-compute-none-)| Use to upload a file from the location specified by `path_on_compute` to your datastore. <br> This upload happens after your run.
156
157
157
-
To reference a specific folder or file in your datastore and make it available on the compute target, use the datastore's[`path()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#path-path-none--data-reference-name-none-) method.
158
+
To reference a specific folder or file in your datastore and make it available on the compute target, use the datastore [`path()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#path-path-none--data-reference-name-none-) method.
158
159
159
160
```Python
160
161
#to mount the full contents in your storage to the compute target
@@ -197,22 +198,9 @@ est = Estimator(source_directory='your code directory',
The Azure Machine Learning service provides several ways to use your models for scoring. Some of these methods do not provide access to datastores. Use the following table to understand which methods allow you to access datastores during scoring:
204
-
205
-
| Method | Datastore access | Description |
206
-
| ----- | :-----: | ----- |
207
-
|[Batch prediction](how-to-run-batch-predictions.md)| ✔ | Make predictions on large quantities of data asynchronously. |
208
-
|[Web service](how-to-deploy-and-where.md)| | Deploy model(s) as a web service. |
209
-
|[IoT Edge module](how-to-deploy-and-where.md)| | Deploy model(s) to IoT Edge devices. |
210
-
211
-
For situations where the SDK does not provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, using the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) to access data stored in blobs.
212
-
213
-
## Compute and datastore matrix
214
-
215
-
The following matrix displays the available data access functionalities for the different compute targets and datastore scenarios. Learn more about the [compute targets for Azure Machine Learning](how-to-set-up-training-targets.md#compute-targets-for-training).
203
+
Datastores currently support storing connection information to the storage services listed in the following matrix. This matrix displays the available data access functionalities for the different compute targets and datastore scenarios. Learn more about the [compute targets for Azure Machine Learning](how-to-set-up-training-targets.md#compute-targets-for-training).
@@ -228,6 +216,19 @@ The following matrix displays the available data access functionalities for the
228
216
> [!NOTE]
229
217
> There may be scenarios in which highly iterative, large data processes run faster using `as_download()` instead of `as_mount()`; this can be validated experimentally.
230
218
219
+
## Access data during scoring
220
+
221
+
The Azure Machine Learning service provides several ways to use your models for scoring. Some of these methods do not provide access to datastores. Use the following table to understand which methods allow you to access datastores during scoring:
222
+
223
+
| Method | Datastore access | Description |
224
+
| ----- | :-----: | ----- |
225
+
|[Batch prediction](how-to-run-batch-predictions.md)| ✔ | Make predictions on large quantities of data asynchronously. |
226
+
|[Web service](how-to-deploy-and-where.md)| | Deploy model(s) as a web service. |
227
+
|[IoT Edge module](how-to-deploy-and-where.md)| | Deploy model(s) to IoT Edge devices. |
228
+
229
+
For situations where the SDK does not provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, using the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) to access data stored in blobs.
0 commit comments