You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-connect-data-ui.md
+36-32Lines changed: 36 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,11 +23,12 @@ This table defines and summarizes the benefits of datastores and datasets.
23
23
|Object|Description| Benefits|
24
24
|---|---|---|
25
25
|Datastores| To securely connect to your storage service on Azure, store your connection information (subscription ID, token authorization, etc.) in the [Key Vault](https://azure.microsoft.com/services/key-vault/) associated with the workspace | Because your information is securely stored, you don't put authentication credentials or original data sources at risk, and you no longer need to hard code these values in your scripts
26
-
|Datasets| Dataset creation also creates a reference to the data source location, along with a copy of its metadata. With datasets you can access data during model training, share data and collaborate with other users, and use open-source libraries, like pandas, for data exploration. | Since datasets are lazily evaluated, and the data remains in its existing location, you keep a single copy of data in your storage. Additionally, you incur no extra storage cost, you avoid unintentional changes to your original data sources, and improve ML workflow performance speeds.|
26
+
|Datasets| Dataset creation also creates a reference to the data source location, along with a copy of its metadata. With datasets you can access data during model training, share data, collaborate with other users, and use open-source libraries, like pandas, for data exploration. | Since datasets are lazily evaluated, and the data remains in its existing location, you keep a single copy of data in your storage. Additionally, you incur no extra storage cost, you avoid unintentional changes to your original data sources, and your ML workflow performance speeds improve.|
27
27
28
-
To learn where datastores and datasets fit in the overall Azure Machine Learning data access workflow, visit [Securely access data](concept-data.md#data-workflow).
28
+
For more information about where datastores and datasets fit in the overall Azure Machine Learning data access workflow, visit [Securely access data](concept-data.md#data-workflow).
29
+
30
+
For more information about the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/) and a code-first experience, visit
29
31
30
-
For more information about the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/) and a code-first experience, see:
31
32
*[Connect to Azure storage services with datastores](how-to-access-data.md)
@@ -39,7 +40,7 @@ For more information about the [Azure Machine Learning Python SDK](/python/api/o
39
40
40
41
- An Azure Machine Learning workspace. [Create workspace resources](../quickstart-create-resources.md)
41
42
42
-
- When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace as datastores. They're named `workspaceblobstore` and `workspacefilestore`, respectively. For sufficient blob storage resources, the `workspaceblobstore` is set as the default datastore, already configured for use. If you require more blob storage resources, you need an Azure storage account, with a [supported storage type](how-to-access-data.md#supported-data-storage-service-types).
43
+
- When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace as datastores. They're named `workspaceblobstore` and `workspacefilestore`, respectively. For sufficient blob storage resources, the `workspaceblobstore` is set as the default datastore, already configured for use. For more blob storage resources, you need an Azure storage account, with a [supported storage type](how-to-access-data.md#supported-data-storage-service-types).
43
44
44
45
## Create datastores
45
46
@@ -52,15 +53,15 @@ You can create datastores with credential-based access or identity-based access.
52
53
Create a new datastore with the Azure Machine Learning studio.
53
54
54
55
> [!IMPORTANT]
55
-
> If your data storage account is located in a virtual network, additional configuration steps are required to ensure that the studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) for more information about the appropriate configuration steps.
56
+
> If your data storage account is located in a virtual network, extra configuration steps are required to ensure that the studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) for more information about the appropriate configuration steps.
56
57
57
58
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
58
59
1. Select **Data** on the left pane under **Assets**.
59
60
1. At the top, select **Datastores**.
60
61
1. Select **+Create**.
61
-
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type and authentication type. For more information about where to find the authentication credentials needed to populate this form, visit the [storage access and permissions section](#access-validation).
62
+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type and authentication type. For more information about where to find the authentication credentials needed to populate this form, visit the [storage access and permissions section](#access-validation) of this document.
62
63
63
-
This screenshot shows the **Azure blob datastore** creation panel:
64
+
The following screenshot shows the **Azure blob datastore** creation panel:
@@ -69,25 +70,23 @@ This screenshot shows the **Azure blob datastore** creation panel:
69
70
For more information about new datastore creation with the Azure Machine Learning studio, visit [identity-based data access](how-to-identity-based-data-access.md).
70
71
71
72
> [!IMPORTANT]
72
-
> If your data storage account resides in a virtual network, additional configuration steps are required to ensure that Studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) to ensure that the appropriate configuration steps are applied.
73
+
> If your data storage account resides in a virtual network, extra configuration steps are required to ensure that Studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) to ensure that the appropriate configuration steps are applied.
73
74
74
75
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
75
76
1. Select **Data** on the left pane under **Assets**.
76
77
1. At the top, select **Datastores**.
77
78
1. Select **+Create**.
78
-
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. See[which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
79
-
1. Customers need to choose the storage acct and container name they want to use
79
+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. For more information, visit[which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
80
+
1. Customers need to choose the storage account and container name they want to use
80
81
81
-
Blob reader role (for ADLS Gen 2 and Blob storage) is required; whoever is creating needs permissions to see the contents of the storage
82
+
The blob reader role (for ADLS Gen 2 and Blob storage) is required; whoever creates it needs permissions to see the contents of the storage
82
83
Reader role of the subscription and resource group
83
84
1. Select **No** to not **Save credentials with the datastore for data access**.
84
85
85
-
This screenshot shows the **Azure blob datastore** creation panel:
86
+
The following screenshot shows the **Azure blob datastore** creation panel:

90
-
91
90
---
92
91
93
92
## Create data assets
@@ -108,22 +107,24 @@ The following steps describe how to create a dataset in [Azure Machine Learning
108
107
109
108
1. Navigate to [Azure Machine Learning studio](https://ml.azure.com)
110
109
111
-
1. Under __Assets__ in the left navigation, select __Data__. On the Data assets tab, select Create
110
+
1. Under __Assets__ in the left navigation, select __Data__. On the Data assets tab, select Create, as shown in the following screenshot:
111
+
112
112
:::image type="content" source="media\how-to-connect-data-ui\data-assets-create.png" lightbox="media/how-to-connect-data-ui/new-id-based-datastore-form.png" alt-text="Screenshot showing Create in the Data assets tab.":::
113
113
114
-
1. Give the data asset a name and optional description. Then, under **Type**, select a Dataset type, either **File** or **Tabular**.
114
+
1. Give the data asset a name and optional description. Then, under **Type**, select a Dataset type, either **File** or **Tabular**, as shown in the following screenshot:
115
+
115
116
:::image type="content" source="media\how-to-connect-data-ui\create-data-asset-name-type.png" lightbox="media\how-to-connect-data-ui\create-data-asset-name-type.png" alt-text="Screenshot showing the setting of the name, description, and type of the data asset.":::
116
117
117
-
1. The **Data source** pane opens next, as shown in this screenshot:
118
+
1. The **Data source** pane opens next, as shown in the following screenshot:
118
119
119
120
:::image type="content" source="media\how-to-connect-data-ui\data-assets-source.png" lightbox="media\how-to-connect-data-ui\data-assets-source.png" alt-text="This screenshot showing the data source selection pane.":::
120
121
121
122
You have different options for your data source. For data already stored in Azure, choose "From Azure storage." To upload data from your local drive, choose "From local files." For data stored at a public web location, choose "From web files." You can also create a data asset from a SQL database, or from [Azure Open Datasets](../../open-datasets/how-to-create-azure-machine-learning-dataset-from-open-dataset.md).
122
123
123
124
1. At the file selection step, select the location where Azure should store your data, and the data files you want to use.
124
-
1. Enable skip validation if your data is in a virtual network. Learn more about [virtual network isolation and privacy](../how-to-enable-studio-virtual-network.md).
125
+
1. Enable skip validation if your data is in a virtual network. For more information about virtual network isolation and privacy, visit [this](../how-to-enable-studio-virtual-network.md) resource.
125
126
126
-
1. Follow the steps to set the data parsing settings and schema for your data asset. The settings prepopulate based on file type, and you can further configure your settings before data asset creation.
127
+
1. Follow the steps to set the data parsing settings and schema for your data asset. The settings prepopulate based on file type, and you can further configure your settings before the creation of the data asset.
127
128
128
129
1. Once you reach the Review step, select Create on the last page
129
130
@@ -132,16 +133,21 @@ You have different options for your data source. For data already stored in Azur
132
133
After you create your dataset, verify that you can view the preview and profile in the studio:
133
134
134
135
1. Sign in to the [Azure Machine Learning studio](https://ml.azure.com/)
135
-
1. Under __Assets__ in the left navigation, select __Data__.
136
+
1. Under __Assets__ in the left navigation, select __Data__ as shown in the following screenshot:
137
+
136
138
:::image type="content" source="media\how-to-connect-data-ui\data-data-assets.png" alt-text="Screenshot highlights Create in the Data assets tab.":::
137
-
1. Select the name of the dataset you want to view.
139
+
140
+
1. Select the name of the dataset you want to view.
138
141
1. Select the **Explore** tab.
139
-
1. Select the **Preview** tab.
142
+
1. Select the **Preview** tab, as shown in the following screenshot:
143
+
140
144
:::image type="content" source="media\how-to-connect-data-ui\explore-preview-dataset.png" alt-text="Screenshot shows a preview of a dataset.":::
141
-
1. Select the **Profile** tab.
145
+
146
+
1. Select the **Profile** tab, as shown in the following screenshot:
147
+
142
148
:::image type="content" source="media\how-to-connect-data-ui\explore-generate-profile.png" alt-text="Screenshot shows dataset column metadata in the Profile tab.":::
143
149
144
-
You can use summary statistics across your data set to verify whether your data set is ML-ready. For non-numeric columns, these statistics include only basic statistics - for example, min, max, and error count. Numeric columns offer statistical moments and estimated quantiles.
150
+
To verify whether your data set is ML-ready, you can use summary statistics across your data set. For non-numeric columns, these statistics include only basic statistical measures - for example, min, max, and error count. Numeric columns offer statistical moments and estimated quantiles.
145
151
146
152
The Azure Machine Learning dataset data profile includes:
147
153
@@ -171,12 +177,12 @@ To ensure that you securely connect to your Azure storage service, Azure Machine
171
177
172
178
### Virtual network
173
179
174
-
If your data storage account is in a **virtual network**, extra configuration steps are required to ensure that Azure Machine Learning has access to your data. See[Use Azure Machine Learning studio in a virtual network](../how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied when you create and register your datastore.
180
+
If your data storage account is in a **virtual network**, extra configuration steps are required to ensure that Azure Machine Learning has access to your data. Visit[Use Azure Machine Learning studio in a virtual network](../how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied when you create and register your datastore.
175
181
176
182
### Access validation
177
183
178
184
> [!WARNING]
179
-
> Cross-tenant access to storage accounts is not supported. If your scenario needs cross-tenant access, please reach out to the Azure Machine Learning Data Support team alias at [email protected] for assistance with a custom code solution.
185
+
> Cross-tenant access to storage accounts isn't supported. If your scenario needs cross-tenant access, reach out to the ([Azure Machine Learning Data Support team](mailto:[email protected])) for assistance with a custom code solution.
180
186
181
187
**As part of the initial datastore creation and registration process**, Azure Machine Learning automatically validates that the underlying storage service exists and that the user-provided principal (username, service principal, or SAS token) has access to the specified storage.
182
188
@@ -187,7 +193,7 @@ To authenticate your access to the underlying storage service, provide either yo
187
193
You can find account key, SAS token, and service principal information at your [Azure portal](https://portal.azure.com).
188
194
189
195
* To obtain an account key for authentication, select **Storage Accounts** in the left pane, and choose the storage account that you want to register
190
-
* The **Overview** page provides information such as the account name, container, and file share name.
196
+
* The **Overview** page provides information such as the account name, container, and file share name
191
197
* Expand the **Security + networking** node in the left nav
192
198
* Select **Access keys**
193
199
* The available key values serve as **Account key** values
@@ -196,12 +202,12 @@ You can find account key, SAS token, and service principal information at your [
196
202
* Select **Shared access signature**
197
203
* Complete the process to generate the SAS value
198
204
199
-
* To use a [service principal](/azure/active-directory/develop/howto-create-service-principal-portal) for authentication, go to your **App registrations** and select which app you want to use.
200
-
* Its corresponding **Overview** page contains required information like tenant ID and client ID.
205
+
* To use a [service principal](/azure/active-directory/develop/howto-create-service-principal-portal) for authentication, go to your **App registrations** and select which app you want to use
206
+
* Its corresponding **Overview** page contains required information like tenant ID and client ID
201
207
202
208
> [!IMPORTANT]
203
209
> * To change your access keys for an Azure Storage account (account key or SAS token), be sure to sync the new credentials with both your workspace and the datastores connected to it. For more information, visit [sync your updated credentials](../how-to-change-storage-access-key.md).
204
-
> * If you unregister and then re-register a datastore with the same name, and that re-registration fails, the Azure Key Vault for your workspace may not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace, but it may not be enabled if you used an existing key vault or have a workspace created prior to October 2020. For more information about how to enable soft-delete, visit [Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
210
+
> * If you unregister and then re-register a datastore with the same name, and that re-registration fails, the Azure Key Vault for your workspace might not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace. However, it might not be enabled if you used an existing key vault or have a workspace created before October 2020. For more information about how to enable soft-delete, visit [Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
205
211
206
212
### Permissions
207
213
@@ -217,7 +223,5 @@ Use your datasets in your machine learning experiments for training ML models. [
217
223
## Next steps
218
224
219
225
*[A step-by-step example of training with TabularDatasets and automated machine learning](../tutorial-first-experiment-automated-ml.md)
220
-
221
226
*[Train a model](how-to-set-up-training-targets.md)
222
-
223
227
* For more dataset training examples, see the [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/work-with-data/)
0 commit comments