You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-connect-data-ui.md
+26-4Lines changed: 26 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
9
9
ms.author: yogipandey
10
10
author: ynpandey
11
11
ms.reviewer: nibaccam
12
-
ms.date: 10/21/2021
12
+
ms.date: 01/18/2021
13
13
ms.custom: data4ml
14
14
15
15
# Customer intent: As low code experience data scientist, I need to make my data in storage on Azure available to my remote compute to train my ML models.
@@ -47,6 +47,10 @@ For a code first experience, see the following articles to use the [Azure Machin
47
47
48
48
You can create datastores from [these Azure storage solutions](how-to-access-data.md#matrix). **For unsupported storage solutions**, and to save data egress cost during ML experiments, you must [move your data](how-to-access-data.md#move) to a supported Azure storage solution. [Learn more about datastores](how-to-access-data.md).
49
49
50
+
You can create datastores with credential-based access or identity-based access.
51
+
52
+
# [Credential-based](#tab/credential)
53
+
50
54
Create a new datastore in a few steps with the Azure Machine Learning studio.
51
55
52
56
> [!IMPORTANT]
@@ -61,13 +65,31 @@ The following example demonstrates what the form looks like when you create an *
61
65
62
66

63
67
68
+
# [Identity-based](#tab/identity)
69
+
70
+
Create a new datastore in a few steps with the Azure Machine Learning studio. Learn more about [identity-based data access](how-to-identity-based-data-access.md).
71
+
72
+
> [!IMPORTANT]
73
+
> If your data storage account is in a virtual network, additional configuration steps are required to ensure the studio has access to your data. See [Network isolation & privacy](how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied.
74
+
75
+
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
76
+
1. Select **Datastores** on the left pane under **Manage**.
77
+
1. Select **+ New datastore**.
78
+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. See [which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
79
+
1. Select **No** to not **Save credentials with the datastore for data access**.
80
+
81
+
The following example demonstrates what the form looks like when you create an **Azure blob datastore**:
82
+
83
+

84
+
85
+
---
86
+
64
87
## Create datasets
65
88
66
89
After you create a datastore, create a dataset to interact with your data. Datasets package your data into a lazily evaluated consumable object for machine learning tasks, like training. [Learn more about datasets](how-to-create-register-datasets.md).
67
90
68
91
There are two types of datasets, FileDataset and TabularDataset.
69
-
[FileDatasets](how-to-create-register-datasets.md#filedataset) create references to single or multiple files or public URLs. Whereas,
70
-
[TabularDatasets](how-to-create-register-datasets.md#tabulardataset) represent your data in a tabular format. You can create TabularDatasets from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
92
+
[FileDatasets](how-to-create-register-datasets.md#filedataset) create references to single or multiple files or public URLs. Whereas [TabularDatasets](how-to-create-register-datasets.md#tabulardataset) represent your data in a tabular format. You can create TabularDatasets from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
71
93
72
94
The following steps and animation show how to create a dataset in [Azure Machine Learning studio](https://ml.azure.com).
73
95
@@ -87,7 +109,7 @@ To create a dataset in the studio:
87
109
1. Select **Next** to populate the **Settings and preview** and **Schema** forms; they are intelligently populated based on file type and you can further configure your dataset prior to creation on these forms.
88
110
1. On the Settings and preview form, you can indicate if your data contains multi-line data.
89
111
1. On the Schema form, you can specify that your TabularDataset has a time component by selecting type: **Timestamp** for your date or time column.
90
-
1. If your data is formatted into subsets, for example time windows, and you want to use those subsets for training, select type **Partition timestamp**. Doing so enables timeseries operations on your dataset. Learn more about how to [leverage partitions in your dataset for training](how-to-monitor-datasets.md?tabs=azure-studio#create-target-dataset).
112
+
1. If your data is formatted into subsets, for example time windows, and you want to use those subsets for training, select type **Partition timestamp**. Doing so enables time series operations on your dataset. Learn more about how to [leverage partitions in your dataset for training](how-to-monitor-datasets.md?tabs=azure-studio#create-target-dataset).
91
113
1. Select **Next** to review the **Confirm details** form. Check your selections and create an optional data profile for your dataset. Learn more about [data profiling](#profile).
92
114
1. Select **Create** to complete your dataset creation.
# Customer intent: As an experienced Python developer, I need to make my data in Azure Storage available to my compute to train my machine learning models.
14
+
# Customer intent: As an experienced Python developer, I need to make my data in Azure Storage available to my compute for training my machine learning models.
15
15
---
16
16
17
17
# Connect to storage by using identity-based data access
18
18
19
19
In this article, you learn how to connect to storage services on Azure by using identity-based data access and Azure Machine Learning datastores via the [Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/intro).
20
20
21
-
Typically, datastores use credential-based data access to confirm you have permission to access the storage service. They keep connection information, like your subscription ID and token authorization, in the [key vault](https://azure.microsoft.com/services/key-vault/) that's associated with the workspace. When you create a datastore that uses identity-based data access, your Azure account ([Azure Active Directory token](../active-directory/fundamentals/active-directory-whatis.md)) is used to confirm you have permission to access the storage service. In this scenario, no authentication credentials are saved. Only the storage account information is stored in the datastore.
21
+
Typically, datastores use **credential-based authentication** to confirm you have permission to access the storage service. They keep connection information, like your subscription ID and token authorization, in the [key vault](https://azure.microsoft.com/services/key-vault/) that's associated with the workspace. When you create a datastore that uses **identity-based data access**, your Azure account ([Azure Active Directory token](../active-directory/fundamentals/active-directory-whatis.md)) is used to confirm you have permission to access the storage service. In the **identity-based data access** scenario, no authentication credentials are saved. Only the storage account information is stored in the datastore.
22
+
23
+
To create datastores with **identity-based** data access via the Azure Machine Learning studio UI, see [Connect to data with the Azure Machine Learning studio](how-to-connect-data-ui.md#create-datastores).
22
24
23
-
To create datastores that use credential-based authentication, like access keys or service principals, see [Connect to storage services on Azure](how-to-access-data.md).
25
+
To create datastores that use **credential-based** authentication, like access keys or service principals, see [Connect to storage services on Azure](how-to-access-data.md).
24
26
25
27
## Identity-based data access in Azure Machine Learning
26
28
27
29
There are two scenarios in which you can apply identity-based data access in Azure Machine Learning. These scenarios are a good fit for identity-based access when you're working with confidential data and need more granular data access management:
28
-
> [!IMPORTANT]
30
+
31
+
> [!WARNING]
29
32
> Identity-based data access is not supported for [automated ML experiments](how-to-configure-auto-train.md).
30
33
31
34
- Accessing storage services
@@ -49,8 +52,7 @@ The same behavior applies when you:
49
52
50
53
### Model training on private data
51
54
52
-
Certain machine learning scenarios involve training models with private data. In such cases, data scientists need to run training workflows without being exposed to the confidential input data. In this scenario, a managed identity of the training compute is used for data access authentication. This approach allows storage admins to grant Storage Blob Data Reader access to the managed identity that the training compute uses to run the training job. The individual data scientists don't need to be granted access. For more information, see [Set up managed identity on a compute cluster](how-to-create-attach-compute-cluster.md#managed-identity).
53
-
55
+
Certain machine learning scenarios involve training models with private data. In such cases, data scientists need to run training workflows without being exposed to the confidential input data. In this scenario, a [managed identity](how-to-use-managed-identities.md) of the training compute is used for data access authentication. This approach allows storage admins to grant Storage Blob Data Reader access to the managed identity that the training compute uses to run the training job. The individual data scientists don't need to be granted access. For more information, see [Set up managed identity on a compute cluster](how-to-create-attach-compute-cluster.md#managed-identity).
54
56
55
57
## Prerequisites
56
58
@@ -72,7 +74,7 @@ Certain machine learning scenarios involve training models with private data. In
72
74
73
75
To help ensure that you securely connect to your storage service on Azure, Azure Machine Learning requires that you have permission to access the corresponding data storage.
74
76
75
-
Identity-based data access supports connections to only the following storage services:
77
+
Identity-based data access supports connections to **only** the following storage services.
0 commit comments