Skip to content

Commit 1ba21e3

Browse files
authored
Merge pull request #89402 from vmagelo/azureml-data-work
Changes inspired by work on azureml.data sdk
2 parents 8ff4aab + 07e3d75 commit 1ba21e3

File tree

2 files changed

+13
-12
lines changed

2 files changed

+13
-12
lines changed

articles/machine-learning/service/how-to-access-data.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ms.custom: seodec18
1818

1919
# Access data in Azure storage services
2020

21-
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization, to access your storage without having to hard code that information in your scripts.
21+
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores are used to store connection information, like your subscription ID and token authorization. Using datastores allows you to access your storage without having to hard code connection information in your scripts.
2222

2323
This how-to shows examples of the following tasks:
2424
* [Register datastores](#access)
@@ -47,7 +47,7 @@ All the register methods are on the [`Datastore`](https://docs.microsoft.com/pyt
4747

4848
The following examples show you to register an Azure Blob Container or an Azure File Share as a datastore.
4949

50-
+ For an **Azure Blob Container Datastore**, use [`register_azure_blob-container()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py)
50+
+ For an **Azure Blob Container Datastore**, use [`register_azure_blob-container()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py#register-azure-blob-container-workspace--datastore-name--container-name--account-name--sas-token-none--account-key-none--protocol-none--endpoint-none--overwrite-false--create-if-not-exists-false--skip-validation-false--blob-cache-timeout-none--grant-workspace-access-false--subscription-id-none--resource-group-none-)
5151

5252
```Python
5353
datastore = Datastore.register_azure_blob_container(workspace=ws,
@@ -82,7 +82,7 @@ To get a specific datastore registered in the current workspace, use the [`get()
8282
#get named datastore from current workspace
8383
datastore = Datastore.get(ws, datastore_name='your datastore name')
8484
```
85-
To get the list of datastores registered with a given workspace, you can use the `datastores` property on a workspace object:
85+
To get the list of datastores registered with a given workspace, you can use the [`datastores`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace%28class%29?view=azure-ml-py#datastores) property on a workspace object:
8686

8787
```Python
8888
#list all datastores registered in current workspace
@@ -108,7 +108,7 @@ ws.set_default_datastore('your datastore name')
108108

109109
<a name="up-and-down"></a>
110110
## Upload & download data
111-
The [`upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py#download-target-path--prefix-none--overwrite-false--show-progress-true-) and [`download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py#download-target-path--prefix-none--overwrite-false--show-progress-true-) methods described in the following examples are specific to and operate identically for the [AzureBlobDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py) and [AzureFileDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azurefiledatastore?view=azure-ml-py) classes.
111+
The [`upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py#upload-src-dir--target-path-none--overwrite-false--show-progress-true-) and [`download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py#download-target-path--prefix-none--overwrite-false--show-progress-true-) methods described in the following examples are specific to and operate identically for the [AzureBlobDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azureblobdatastore?view=azure-ml-py) and [AzureFileDatastore](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.azurefiledatastore?view=azure-ml-py) classes.
112112

113113
### Upload
114114

@@ -153,8 +153,8 @@ The following table lists the methods that tell the compute target how to use th
153153
Way|Method|Description|
154154
----|-----|--------
155155
Mount| [`as_mount()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-mount--)| Use to mount the datastore on the compute target.
156-
Download|[`as_download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-download-path-on-compute-none-)|Use to download the contents of your datastore to the location specified by `path_on_compute`. <br> This download happens before the run.
157-
Upload|[`as_upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-upload-path-on-compute-none-)| Use to upload a file from the location specified by `path_on_compute` to your datastore. <br> This upload happens after your run.
156+
Download|[`as_download()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-download-path-on-compute-none-)|Use to download the contents of your datastore to the location specified by `path_on_compute`. <br><br> This download happens before the run.
157+
Upload|[`as_upload()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#as-upload-path-on-compute-none-)| Use to upload a file from the location specified by `path_on_compute` to your datastore. <br><br> This upload happens after your run.
158158

159159
To reference a specific folder or file in your datastore and make it available on the compute target, use the datastore [`path()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.azure_storage_datastore.abstractazurestoragedatastore?view=azure-ml-py#path-path-none--data-reference-name-none-) method.
160160

@@ -166,13 +166,13 @@ datastore.as_mount()
166166
datastore.path('./bar').as_download()
167167
```
168168
> [!NOTE]
169-
> Any `datastore` or `datastore.path` object resolves to an environment variable name of the format `"$AZUREML_DATAREFERENCE_XXXX"`, whose value represents the mount/download path on the target compute. The datastore path on the target compute might not be the same as the execution path for the training script.
169+
> Any specified `datastore` or `datastore.path` object resolves to an environment variable name of the format `"$AZUREML_DATAREFERENCE_XXXX"`, whose value represents the mount/download path on the target compute. The datastore path on the target compute might not be the same as the execution path for the training script.
170170
171171
### Examples
172172

173173
The following code examples are specific to the [`Estimator`](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class for accessing data during training.
174174

175-
`script_params` is a dictionary containing parameters to the entry_script. You can use it to pass in a datastore and describe how data should be made available on compute target. Learn more from our end-to-end [tutorial](tutorial-train-models-with-aml.md).
175+
`script_params` is a dictionary containing parameters to the entry_script. Use it to pass in a datastore and describe how data is made available on the compute target. Learn more from our end-to-end [tutorial](tutorial-train-models-with-aml.md).
176176

177177
```Python
178178
from azureml.train.estimator import Estimator
@@ -239,7 +239,7 @@ Azure Machine Learning provides several ways to use your models for scoring. Som
239239
| [Web service](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) as a web service. |
240240
| [IoT Edge module](how-to-deploy-and-where.md) | &nbsp; | Deploy model(s) to IoT Edge devices. |
241241

242-
For situations where the SDK does not provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, using the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) to access data stored in blobs.
242+
For situations where the SDK doesn't provide access to datastores, you may be able to create custom code using the relevant Azure SDK to access the data. For example, the [Azure Storage SDK for Python](https://github.com/Azure/azure-storage-python) is a client library that you can use to access data stored in blobs or files.
243243

244244

245245
## Next steps

articles/machine-learning/service/how-to-create-register-datasets.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -168,9 +168,9 @@ mnist_ds = Dataset.File.from_files(path=web_paths)
168168

169169
## Register datasets
170170

171-
To complete the creation process, register your datasets with workspace:
171+
To complete the creation process, register your datasets with a workspace.
172172

173-
Use the [`register()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#register-workspace--name--description-none--tags-none--visible-true--exist-ok-false--update-if-exist-false-) method to register datasets to your workspace so they can be shared with others and reused across various experiments.
173+
Use the [`register()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#register-workspace--name--description-none--tags-none--visible-true--exist-ok-false--update-if-exist-false-) method to register datasets with your workspace so they can be shared with others and reused across various experiments.
174174

175175
```Python
176176
titanic_ds = titanic_ds.register(workspace = workspace,
@@ -183,7 +183,8 @@ titanic_ds = titanic_ds.register(workspace = workspace,
183183
184184
## Version datasets
185185

186-
You can register a new dataset under the same name by creating a new version. Dataset version is a way to bookmark the state of your data, so you can apply a specific version of the dataset for experimentation or future reproduction. Typical scenarios to consider versioning:
186+
You can register a new dataset under the same name by creating a new version. Dataset version is a way to bookmark the state of your data, so you can apply a specific version of the dataset for experimentation or future reproduction. Typical scenarios to consider for versioning are:
187+
187188
* When new data is available for retraining.
188189
* When you are applying different data preparation or feature engineering approaches.
189190

0 commit comments

Comments
 (0)