You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-access-data.md
+33-33Lines changed: 33 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
9
9
ms.author: yogipandey
10
10
author: ynpandey
11
11
ms.reviewer: nibaccam
12
-
ms.date: 02/27/2024
12
+
ms.date: 03/13/2025
13
13
ms.custom: UpdateFrequency5, data4ml
14
14
#Customer intent: As an experienced Python developer, I need to make my data in Azure storage available to my remote compute to train my machine learning models.
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores and the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/intro).
23
23
24
-
Datastores securely connect to your storage service on Azure, and they avoid risk to your authentication credentials or the integrity of your original data store. A datastore stores connection information - for example, your subscription ID or token authorization - in the [Key Vault](https://azure.microsoft.com/services/key-vault/) associated with the workspace. With a datastore, you can securely access your storage because you can avoid hard-coding connection information in your scripts. You can create datastores that connect to [these Azure storage solutions](#supported-data-storage-service-types).
24
+
A datastore securely connects to your storage service on Azure, and it avoids risk to your authentication credentials or the integrity of your original data store. A datastore stores connection information - for example, your subscription ID or token authorization - in the [Key Vault](https://azure.microsoft.com/services/key-vault/) associated with the workspace. With a datastore, you can securely access your storage because you can avoid hard-coding connection information in your scripts. You can create datastores that connect to [these Azure storage solutions](#supported-data-storage-service-types).
25
25
26
-
For information that describes how datastores fit with the Azure Machine Learning overall data access workflow, visit [Securely access data](concept-data.md#data-workflow) article.
26
+
For more information describing how datastores fit with the overall Azure Machine Learning data access workflow, visit [Securely access data](concept-data.md#data-workflow) article.
27
27
28
28
To learn how to connect to a data storage resource with a UI, visit [Connect to data storage with the studio UI](how-to-connect-data-ui.md#create-datastores).
29
29
30
30
>[!TIP]
31
-
> This article assumes that you will connect to your storage service with credential-based authentication credentials - for example, a service principal or a shared access signature (SAS) token. Note that if credentials are registered with datastores, all users with the workspace *Reader* role can retrieve those credentials. For more information, visit [Manage roles in your workspace](../how-to-assign-roles.md#default-roles).
31
+
> This article assumes that you want to connect to your storage service with credential-based authentication credentials - for example, a service principal or a shared access signature (SAS) token. If credentials are registered with datastores, all users with the workspace *Reader* role can retrieve those credentials. For more information, visit [Manage roles in your workspace](../how-to-assign-roles.md#default-roles).
32
32
>
33
33
> For more information about identity-based data access, visit [Identity-based data access to storage services (v1)](../how-to-identity-based-data-access.md).
34
34
@@ -42,28 +42,28 @@ To learn how to connect to a data storage resource with a UI, visit [Connect to
42
42
43
43
- An Azure Machine Learning workspace.
44
44
45
-
[Create an Azure Machine Learning workspace](../quickstart-create-resources.md), or use an existing workspace via the Python SDK
45
+
[Create an Azure Machine Learning workspace](../quickstart-create-resources.md), or use an existing workspace via the Python SDK
46
46
47
-
Import the `Workspace` and `Datastore` class, and load your subscription information from the `config.json` file with the `from_config()` function. By default, the function looks for the JSON file in the current directory, but you can also specify a path parameter to point to the file with `from_config(path="your/file/path")`:
47
+
Import the `Workspace` and `Datastore` class, and load your subscription information from the `config.json` file with the `from_config()` function. By default, the function looks for the JSON file in the current directory, but you can also specify a path parameter to point to the file with `from_config(path="your/file/path")`:
48
48
49
-
```Python
50
-
import azureml.core
51
-
from azureml.core import Workspace, Datastore
52
-
53
-
ws = Workspace.from_config()
54
-
```
49
+
```python
50
+
import azureml.core
51
+
from azureml.core import Workspace, Datastore
52
+
53
+
ws = Workspace.from_config()
54
+
```
55
55
56
-
Workspace creation automatically registers an Azure blob container and an Azure file share, as datastores, to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. The `workspaceblobstore` stores workspace artifacts and your machine learning experiment logs. It serves as the **default datastore** and can't be deleted from the workspace. The `workspacefilestore` stores notebooks and R scripts authorized via [compute instance](../concept-compute-instance.md#accessing-files).
56
+
Workspace creation automatically registers an Azure blob container and an Azure file share, as datastores, to the workspace. They're named `workspaceblobstore` and `workspacefilestore`, respectively. The `workspaceblobstore` stores workspace artifacts and your machine learning experiment logs. It serves as the **default datastore** and can't be deleted from the workspace. The `workspacefilestore` stores notebooks and R scripts authorized via a[compute instance](../concept-compute-instance.md#accessing-files).
57
57
58
-
> [!NOTE]
59
-
> Azure Machine Learning designer automatically creates a datastore named **azureml_globaldatasets** when you open a sample in the designer homepage. This datastore only contains sample datasets. Please **do not** use this datastore for any confidential data access.
58
+
> [!NOTE]
59
+
> Azure Machine Learning designer automatically creates a datastore named **azureml_globaldatasets** when you open a sample in the designer homepage. This datastore only contains sample datasets. **Do not** use this datastore for any confidential data access.
60
60
61
61
## Supported data storage service types
62
62
63
-
Datastores currently support storage of connection information to the storage services listed in this matrix:
63
+
Datastores currently support storage of connection information to the storage services listed in this matrix:
64
64
65
65
> [!TIP]
66
-
> **For unsupported storage solutions** (those not listed in the following table), you might encounter issues as you connect and work with your data. We suggest that you [move your data](#move-data-to-supported-azure-storage-solutions) to a supported Azure storage solution. This can also help with additional scenarios- - for example, reduction of data egress cost during ML experiments.
66
+
> **For unsupported storage solutions** (those not listed in the following table), you might encounter issues as you connect and work with your data. We suggest that you [move your data](#move-data-to-supported-azure-storage-solutions) to a supported Azure storage solution. This can also help with other scenarios - for example, reduction of data egress cost during ML experiments.
@@ -108,7 +108,7 @@ Azure Machine Learning can receive requests from clients outside of the virtual
108
108
### Access validation
109
109
110
110
> [!WARNING]
111
-
> Crosstenant access to storage accounts is not supported. If your scenario needs crosstenant access, reach out to the Azure Machine Learning Data Support team alias at **[email protected]** for assistance with a custom code solution.
111
+
> Cross-tenant access to storage accounts isn't supported. If your scenario needs cross-tenant access, reach out to the ([Azure Machine Learning Data Support team](mailto:[email protected])) for assistance with a custom code solution.
112
112
113
113
**As part of the initial datastore creation and registration process**, Azure Machine Learning automatically validates that the underlying storage service exists and that the user-provided principal (username, service principal, or SAS token) can access the specified storage.
114
114
@@ -119,7 +119,7 @@ To authenticate your access to the underlying storage service, you can provide e
119
119
You can find account key, SAS token, and service principal information at your [Azure portal](https://portal.azure.com).
120
120
121
121
* To use an account key or SAS token for authentication, select **Storage Accounts** on the left pane, and choose the storage account that you want to register
122
-
* The **Overview** page provides account name, file share name, container, etc. information
122
+
* The **Overview** page provides account name, file share name, container, etc. information
123
123
* For account keys, go to **Access keys** on the **Settings** pane
124
124
* For SAS tokens, go to **Shared access signatures** on the **Settings** pane
125
125
@@ -139,7 +139,7 @@ For Azure blob container and Azure Data Lake Gen 2 storage, ensure that your aut
139
139
140
140
## Create and register datastores
141
141
142
-
Registration of an Azure storage solution as a datastore automatically creates and registers that datastore to a specific workspace. Review [storage access & permissions](#storage-access-and-permissions) in this document for guidance about virtual network scenarios, and where to find required authentication credentials.
142
+
Registration of an Azure storage solution as a datastore automatically creates and registers that datastore to a specific workspace. Review the [storage access & permissions](#storage-access-and-permissions) section in this document for guidance about virtual network scenarios, and where to find required authentication credentials.
143
143
144
144
That section offers examples that describe how to create and register a datastore via the Python SDK for these storage types. The parameters shown these examples are the **required parameters** to create and register a datastore:
145
145
@@ -152,18 +152,18 @@ That section offers examples that describe how to create and register a datastor
152
152
To learn how to connect to a data storage resource with a UI, visit [Connect to data with Azure Machine Learning studio](how-to-connect-data-ui.md).
153
153
154
154
>[!IMPORTANT]
155
-
> If you unregister and re-register a datastore with the same name, and the re-registration fails, the Azure Key Vault for your workspace may not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace, but it may not be enabled if you used an existing key vault or have a workspace created before October 2020. For information that describes how to enable soft-delete, see[Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
155
+
> If you unregister and re-register a datastore with the same name, and the re-registration fails, the Azure Key Vault for your workspace might not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace. However, it might not be enabled if you used an existing key vault, or if you have a workspace created before October 2020. For more information about how to enable soft-delete, visit[Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
156
156
157
157
> [!NOTE]
158
-
> A datastore name should only contain lowercase letters, digits and underscores.
158
+
> A datastore name should only contain lowercase letters, digits, and underscores.
159
159
160
160
### Azure blob container
161
161
162
162
To register an Azure blob container as a datastore, use the [`register_azure_blob_container()`](/python/api/azureml-core/azureml.core.datastore%28class%29#azureml-core-datastore-register-azure-blob-container) method.
163
163
164
-
This code sample creates and registers the `blob_datastore_name` datastore to the `ws` workspace. The datastore uses the provided account access key to access the `my-container-name` blob container on the `my-account-name` storage account. Review the [storage access & permissions](#storage-access-and-permissions) section for guidance about virtual network scenarios, and where to find required authentication credentials.
164
+
The following code sample creates and registers the `blob_datastore_name` datastore to the `ws` workspace. The datastore uses the provided account access key to access the `my-container-name` blob container on the `my-account-name` storage account. Review the [storage access & permissions](#storage-access-and-permissions) section for guidance about virtual network scenarios, and where to find required authentication credentials.
165
165
166
-
```Python
166
+
```python
167
167
blob_datastore_name='azblobsdk'# Name of the datastore to workspace
168
168
container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") # Name of Azure blob container
169
169
account_name=os.getenv("BLOB_ACCOUNTNAME", "<my-account-name>") # Storage account name
@@ -182,7 +182,7 @@ To register an Azure file share as a datastore, use the [`register_azure_file_sh
182
182
183
183
This code sample creates and registers the `file_datastore_name` datastore to the `ws` workspace. The datastore uses the `my-fileshare-name` file share on the `my-account-name` storage account, with the provided account access key. Review the [storage access & permissions](#storage-access-and-permissions) section for guidance about virtual network scenarios, and where to find required authentication credentials.
184
184
185
-
```Python
185
+
```python
186
186
file_datastore_name='azfilesharesdk'# Name of the datastore to workspace
187
187
file_share_name=os.getenv("FILE_SHARE_CONTAINER", "<my-fileshare-name>") # Name of Azure file share container
188
188
account_name=os.getenv("FILE_SHARE_ACCOUNTNAME", "<my-account-name>") # Storage account name
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use the[register_azure_data_lake_gen2()](/python/api/azureml-core/azureml.core.datastore%28class%29#azureml-core-datastore-register-azure-data-lake-gen2) method to register a credential datastore connected to an Azure Data Lake Gen 2 storage with [service principal permissions](/azure/active-directory/develop/howto-create-service-principal-portal).
200
+
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use the[register_azure_data_lake_gen2()](/python/api/azureml-core/azureml.core.datastore%28class%29#azureml-core-datastore-register-azure-data-lake-gen2) method to register a credential datastore connected to an Azure Data Lake Gen 2 storage with [service principal permissions](/azure/active-directory/develop/howto-create-service-principal-portal).
201
201
202
202
To use your service principal, you must [register your application](/azure/active-directory/develop/app-objects-and-service-principals) and grant the service principal data access via either Azure role-based access control (Azure RBAC) or access control lists (ACL). For more information, visit [access control set up for ADLS Gen 2](/azure/storage/blobs/data-lake-storage-access-control-model).
203
203
204
204
This code creates and registers the `adlsgen2_datastore_name` datastore to the `ws` workspace. This datastore accesses the file system `test` in the `account_name` storage account, through use of the provided service principal credentials. Review the [storage access & permissions](#storage-access-and-permissions) section for guidance on virtual network scenarios, and where to find required authentication credentials.
205
205
206
-
```python
206
+
```python
207
207
adlsgen2_datastore_name ='adlsgen2datastore'
208
208
209
209
subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of ADLS account
@@ -242,13 +242,13 @@ After datastore creation, [create an Azure Machine Learning dataset](how-to-crea
242
242
243
243
To get a specific datastore registered in the current workspace, use the [`get()`](/python/api/azureml-core/azureml.core.datastore%28class%29#get-workspace--datastore-name-) static method on the `Datastore` class:
244
244
245
-
```Python
245
+
```python
246
246
# Get a named datastore from the current workspace
To get the list of datastores registered with a given workspace, use the [`datastores`](/python/api/azureml-core/azureml.core.workspace%28class%29#datastores) property on a workspace object:
250
250
251
-
```Python
251
+
```python
252
252
# List all datastores registered in the current workspace
253
253
datastores = ws.datastores
254
254
for name, datastore in datastores.items():
@@ -257,18 +257,18 @@ for name, datastore in datastores.items():
257
257
258
258
This code sample shows how to get the default datastore of the workspace:
259
259
260
-
```Python
260
+
```python
261
261
datastore = ws.get_default_datastore()
262
262
```
263
-
You can also change the default datastore with this code sample. Only the SDK supports this ability:
263
+
You can also change the default datastore with the following code sample. Only the SDK supports this ability:
264
264
265
-
```Python
265
+
```python
266
266
ws.set_default_datastore(new_default_datastore)
267
267
```
268
268
269
269
## Access data during scoring
270
270
271
-
Azure Machine Learning provides several ways to use your models for scoring. Some of these methods provide no access to datastores. The following table describes which methods allow access to datastores during scoring:
271
+
Azure Machine Learning provides several ways to use your models for scoring. Some of these methods provide no access to datastores. The following table describes the methods which allow access to datastores during scoring:
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-connect-data-ui.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
9
9
ms.author: yogipandey
10
10
author: ynpandey
11
11
ms.reviewer: fsolomon
12
-
ms.date: 02/09/2024
12
+
ms.date: 03/13/2025
13
13
ms.custom: UpdateFrequency5, data4ml
14
14
#Customer intent: As low code experience data scientist, I need to make my data in storage on Azure available to my remote compute to train my ML models.
0 commit comments