Skip to content

Commit 252c712

Browse files
authored
Merge pull request #108653 from MayMSFT/patch-37
Update how-to-access-data.md
2 parents 5461302 + 7045ebb commit 252c712

File tree

1 file changed

+13
-18
lines changed

1 file changed

+13
-18
lines changed

articles/machine-learning/how-to-access-data.md

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Access data in Azure storage services
2+
title: Connect to Azure storage services
33
titleSuffix: Azure Machine Learning
44
description: Learn how to use datastores to securely connect to Azure storage services during training with Azure Machine Learning
55
services: machine-learning
@@ -9,16 +9,16 @@ ms.topic: conceptual
99
ms.author: sihhu
1010
author: MayMSFT
1111
ms.reviewer: nibaccam
12-
ms.date: 02/27/2020
12+
ms.date: 03/24/2020
1313
ms.custom: seodec18
1414

1515
# Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
1616
---
1717

18-
# Access data in Azure storage services
18+
# Connect to Azure storage services
1919
[!INCLUDE [aml-applies-to-basic-enterprise-sku](../../includes/aml-applies-to-basic-enterprise-sku.md)]
2020

21-
In this article, learn how to easily access your data in Azure storage services via Azure Machine Learning datastores. Datastores store connection information, like your subscription ID and token authorization, so you can access your storage without having to hard code them in your scripts.
21+
In this article, learn how to connect to Azure storage services via Azure Machine Learning datastores. Datastores store connection information, like your subscription ID and token authorization in your [Key Vault](https://azure.microsoft.com/services/key-vault/) associated with the workspace, so you can securely access your storage without having to hard code them in your scripts.
2222

2323
You can create datastores from [these Azure storage solutions](#matrix). For unsupported storage solutions, and to save data egress cost during machine learning experiments, we recommend that you [move your data](#move) to supported Azure storage solutions.
2424

@@ -63,9 +63,15 @@ Datastores currently support storing connection information to the storage servi
6363

6464
### Storage guidance
6565

66-
We recommend creating a datastore for an Azure blob container. Both standard and premium storage are available for blobs. Although premium storage is more expensive, its faster throughput speeds might improve the speed of your training runs, particularly if you train against a large dataset. For information about the cost of storage accounts, see the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service).
66+
We recommend creating a datastore for an [Azure Blob container](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-introduction). Both standard and premium storage are available for blobs. Although premium storage is more expensive, its faster throughput speeds might improve the speed of your training runs, particularly if you train against a large dataset. For information about the cost of storage accounts, see the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=machine-learning-service).
6767

68-
When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. They store the connection information for the blob container and the file share that are provisioned in the storage account attached to the workspace. The `workspaceblobstore` container is set as the default datastore.
68+
[Azure Data Lake Storage Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction?toc=/azure/storage/blobs/toc.json) is built on top of Azure Blob storage and designed for enterprise big data analytics. A fundamental part of Data Lake Storage Gen2 is the addition of a [hierarchical namespace](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-namespace) to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
69+
70+
When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. `workspaceblobstore` is used to store workspace artifacts and your machine learning experiment logs. `workspacefilestore` is used to store notebooks and R scripts authorized via [compute instance](https://docs.microsoft.com/azure/machine-learning/concept-compute-instance#accessing-files). The `workspaceblobstore` container is set as the default datastore.
71+
72+
> [!IMPORTANT]
73+
> Azure Machine Learning designer (preview) will create a datastore named **azureml_globaldatasets** automatically when you open a sample in the designer homepage. This datastore only contains sample datasets. Please **do not** use this datastore for any confidential data access!
74+
> ![Auto-created datastore for designer sample datasets](media/how-to-access-data/datastore-designer-sample.png)
6975
7076
<a name="access"></a>
7177

@@ -134,7 +140,7 @@ file_datastore = Datastore.register_azure_file_share(workspace=ws,
134140

135141
#### Azure Data Lake Storage Generation 2
136142

137-
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use [register_azure_data_lake_gen2()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#register-azure-data-lake-gen2-workspace--datastore-name--filesystem--account-name--tenant-id--client-id--client-secret--resource-url-none--authority-url-none--protocol-none--endpoint-none--overwrite-false-) to register a credential datastore connected to an Azure DataLake Gen 2 storage with [service principal permissions](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal). In order to utilize your service principal you need to [register your application](https://docs.microsoft.com/azure/active-directory/develop/app-objects-and-service-principals) and set role assignments to Reader and Data access. Learn more about [access control set up for ADLS Gen 2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control).
143+
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use [register_azure_data_lake_gen2()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#register-azure-data-lake-gen2-workspace--datastore-name--filesystem--account-name--tenant-id--client-id--client-secret--resource-url-none--authority-url-none--protocol-none--endpoint-none--overwrite-false-) to register a credential datastore connected to an Azure DataLake Gen 2 storage with [service principal permissions](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal). In order to utilize your service principal you need to [register your application](https://docs.microsoft.com/azure/active-directory/develop/app-objects-and-service-principals) and grant the service principal with the right data access. Learn more about [access control set up for ADLS Gen 2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control).
138144

139145
The following code creates and registers the `adlsgen2_datastore_name` datastore to the `ws` workspace. This datastore accesses the file system `test` on the `account_name` storage account, by using the provided service principal credentials.
140146

@@ -185,10 +191,6 @@ The following example demonstrates what the form looks like when you create an A
185191

186192
## Get datastores from your workspace
187193

188-
> [!IMPORTANT]
189-
> Azure Machine Learning designer (preview) will create a datastore named **azureml_globaldatasets** automatically when you open a sample in the designer homepage. This datastore only contains sample datasets. Please **do not** use this datastore for any confidential data access!
190-
> ![Auto-created datastore for designer sample datasets](media/how-to-access-data/datastore-designer-sample.png)
191-
192194
To get a specific datastore registered in the current workspace, use the [`get()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py#get-workspace--datastore-name-) static method on the `Datastore` class:
193195

194196
```Python
@@ -210,13 +212,6 @@ To get the workspace's default datastore, use this line:
210212
datastore = ws.get_default_datastore()
211213
```
212214

213-
To define a different default datastore for the current workspace, use the [`set_default_datastore()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#set-default-datastore-name-) method on the workspace object:
214-
215-
```Python
216-
# Define the default datastore for the current workspace
217-
ws.set_default_datastore('your datastore name')
218-
```
219-
220215
<a name="up-and-down"></a>
221216
## Upload and download data
222217

0 commit comments

Comments
 (0)