Skip to content

Commit b3d2fb2

Browse files
authored
Merge pull request #84163 from nibaccam/access
Access data| intro rework and table placement
2 parents 235e467 + 3a81adc commit b3d2fb2

File tree

1 file changed

+28
-27
lines changed

1 file changed

+28
-27
lines changed

articles/machine-learning/service/how-to-access-data.md

Lines changed: 28 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,19 @@ services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: core
88
ms.topic: conceptual
9-
ms.author: minxia
10-
author: mx-iao
11-
ms.reviewer: sgilley
12-
ms.date: 05/24/2019
9+
ms.author: sihhu
10+
author: MayMSFT
11+
ms.reviewer: nibaccam
12+
ms.date: 08/2/2019
1313
ms.custom: seodec18
1414

15+
# Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
1516

1617
---
1718

1819
# Access data in Azure storage services
1920

20-
In Azure Machine Learning service, we make it easy to access data in Azure storage services via datastores. Datastores are used to store connection information and secret to access your storage. You can use datastores to access your storage services during training instead of hard coding the connection information and secret in your script.
21+
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization, to access your storage without having to hard code that information in your scripts.
2122

2223
This how-to shows examples of the following tasks:
2324
* [Register datastores](#access)
@@ -42,8 +43,6 @@ ws = Workspace.from_config()
4243

4344
## Register datastores
4445

45-
Datastores currently support storing connection information to the following storage services: Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System.
46-
4746
All the register methods are on the [`Datastore`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) class and have the form register_azure_*.
4847

4948
The following examples show you to register an Azure Blob Container or an Azure File Share as a datastore.
@@ -70,6 +69,7 @@ The following examples show you to register an Azure Blob Container or an Azure
7069
```
7170

7271
#### Storage guidance
72+
7373
We recommend Azure Blob Container. Both standard and premium storage are available for blobs. Although more expensive, we suggest premium storage due to faster throughput speeds that may improve the speed of your training runs, particularly if you train against a large data set. See the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service) for storage account cost information.
7474

7575
<a name="get"></a>
@@ -114,7 +114,7 @@ The [`upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data
114114

115115
Upload either a directory or individual files to the datastore using the Python SDK.
116116

117-
To upload a directory to a datastore `ds`:
117+
To upload a directory to a datastore `datastore`:
118118

119119
```Python
120120
import azureml.data
@@ -126,11 +126,12 @@ datastore.upload(src_dir='your source directory',
126126
show_progress=True)
127127
```
128128

129-
`target_path` specifies the location in the file share (or blob container) to upload. It defaults to `None`, in which case the data gets uploaded to root. `overwrite=True` will overwrite any existing data at `target_path`.
129+
The `target_path` parameter specifies the location in the file share (or blob container) to upload. It defaults to `None`, in which case the data gets uploaded to root. When `overwrite=True` any existing data at `target_path` is overwritten.
130130

131-
Or upload a list of individual files to the datastore via the datastore's `upload_files()` method.
131+
Or upload a list of individual files to the datastore via the `upload_files()` method.
132132

133133
### Download
134+
134135
Similarly, download data from a datastore to your local file system.
135136

136137
```Python
@@ -139,7 +140,7 @@ datastore.download(target_path='your target path',
139140
show_progress=True)
140141
```
141142

142-
`target_path` is the location of the local directory to download the data to. To specify a path to the folder in the file share (or blob container) to download, provide that path to `prefix`. If `prefix` is `None`, all the contents of your file share (or blob container) will get downloaded.
143+
The `target_path` parameter is the location of the local directory to download the data to. To specify a path to the folder in the file share (or blob container) to download, provide that path to `prefix`. If `prefix` is `None`, all the contents of your file share (or blob container) will get downloaded.
143144

144145
<a name="train"></a>
145146
## Access your data during training
@@ -154,7 +155,7 @@ Mount| [`as_mount()`](https://docs.microsoft.com/python/api/azureml-core/azureml
154155
Download|[`as_download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-download-path-on-compute-none-)|Use to download the contents of your datastore to the location specified by `path_on_compute`. <br> This download happens before the run.
155156
Upload|[`as_upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-upload-path-on-compute-none-)| Use to upload a file from the location specified by `path_on_compute` to your datastore. <br> This upload happens after your run.
156157

157-
To reference a specific folder or file in your datastore and make it available on the compute target, use the datastore's [`path()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#path-path-none--data-reference-name-none-) method.
158+
To reference a specific folder or file in your datastore and make it available on the compute target, use the datastore [`path()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#path-path-none--data-reference-name-none-) method.
158159

159160
```Python
160161
#to mount the full contents in your storage to the compute target
@@ -197,22 +198,9 @@ est = Estimator(source_directory='your code directory',
197198
entry_script='train.py',
198199
inputs=[datastore1.as_download(), datastore2.path('./foo').as_download(), datastore3.as_upload(path_on_compute='./bar.pkl')])
199200
```
201+
### Compute and datastore matrix
200202

201-
## Access data during scoring
202-
203-
The Azure Machine Learning service provides several ways to use your models for scoring. Some of these methods do not provide access to datastores. Use the following table to understand which methods allow you to access datastores during scoring:
204-
205-
| Method | Datastore access | Description |
206-
| ----- | :-----: | ----- |
207-
| [Batch prediction](how-to-run-batch-predictions.md) || Make predictions on large quantities of data asynchronously. |
208-
| [Web service](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) as a web service. |
209-
| [IoT Edge module](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) to IoT Edge devices. |
210-
211-
For situations where the SDK does not provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, using the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) to access data stored in blobs.
212-
213-
## Compute and datastore matrix
214-
215-
The following matrix displays the available data access functionalities for the different compute targets and datastore scenarios. Learn more about the [compute targets for Azure Machine Learning](how-to-set-up-training-targets.md#compute-targets-for-training).
203+
Datastores currently support storing connection information to the storage services listed in the following matrix. This matrix displays the available data access functionalities for the different compute targets and datastore scenarios. Learn more about the [compute targets for Azure Machine Learning](how-to-set-up-training-targets.md#compute-targets-for-training).
216204

217205
|Compute|[AzureBlobDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py) |[AzureFileDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azurefiledatastore?view=azure-ml-py) |[AzureDataLakeDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_data_lake_datastore.azuredatalakedatastore?view=azure-ml-py) |[AzureDataLakeGen2Datastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_data_lake_datastore.azuredatalakegen2datastore?view=azure-ml-py) [AzurePostgreSqlDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_postgre_sql_datastore.azurepostgresqldatastore?view=azure-ml-py) [AzureSqlDatabaseDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_sql_database_datastore.azuresqldatabasedatastore?view=azure-ml-py) |
218206
|--------------------------------|----------------------------------------------------------|----------------------------------------------------------|------------------------|----------------------------------------------------------------------------------------|
@@ -228,6 +216,19 @@ The following matrix displays the available data access functionalities for the
228216
> [!NOTE]
229217
> There may be scenarios in which highly iterative, large data processes run faster using `as_download()` instead of `as_mount()`; this can be validated experimentally.
230218
219+
## Access data during scoring
220+
221+
The Azure Machine Learning service provides several ways to use your models for scoring. Some of these methods do not provide access to datastores. Use the following table to understand which methods allow you to access datastores during scoring:
222+
223+
| Method | Datastore access | Description |
224+
| ----- | :-----: | ----- |
225+
| [Batch prediction](how-to-run-batch-predictions.md) || Make predictions on large quantities of data asynchronously. |
226+
| [Web service](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) as a web service. |
227+
| [IoT Edge module](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) to IoT Edge devices. |
228+
229+
For situations where the SDK does not provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, using the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) to access data stored in blobs.
230+
231+
231232
## Next steps
232233

233234
* [Train a model](how-to-train-ml-models.md)

0 commit comments

Comments
 (0)