Skip to content

Commit ecf8ac4

Browse files
committed
Freshness update for how-to-connect-data-ui.md . . .
1 parent ea3444c commit ecf8ac4

File tree

1 file changed

+36
-32
lines changed

1 file changed

+36
-32
lines changed

articles/machine-learning/v1/how-to-connect-data-ui.md

Lines changed: 36 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,12 @@ This table defines and summarizes the benefits of datastores and datasets.
2323
|Object|Description| Benefits|
2424
|---|---|---|
2525
|Datastores| To securely connect to your storage service on Azure, store your connection information (subscription ID, token authorization, etc.) in the [Key Vault](https://azure.microsoft.com/services/key-vault/) associated with the workspace | Because your information is securely stored, you don't put authentication credentials or original data sources at risk, and you no longer need to hard code these values in your scripts
26-
|Datasets| Dataset creation also creates a reference to the data source location, along with a copy of its metadata. With datasets you can access data during model training, share data and collaborate with other users, and use open-source libraries, like pandas, for data exploration. | Since datasets are lazily evaluated, and the data remains in its existing location, you keep a single copy of data in your storage. Additionally, you incur no extra storage cost, you avoid unintentional changes to your original data sources, and improve ML workflow performance speeds.|
26+
|Datasets| Dataset creation also creates a reference to the data source location, along with a copy of its metadata. With datasets you can access data during model training, share data, collaborate with other users, and use open-source libraries, like pandas, for data exploration. | Since datasets are lazily evaluated, and the data remains in its existing location, you keep a single copy of data in your storage. Additionally, you incur no extra storage cost, you avoid unintentional changes to your original data sources, and your ML workflow performance speeds improve.|
2727

28-
To learn where datastores and datasets fit in the overall Azure Machine Learning data access workflow, visit [Securely access data](concept-data.md#data-workflow).
28+
For more information about where datastores and datasets fit in the overall Azure Machine Learning data access workflow, visit [Securely access data](concept-data.md#data-workflow).
29+
30+
For more information about the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/) and a code-first experience, visit
2931

30-
For more information about the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/) and a code-first experience, see:
3132
* [Connect to Azure storage services with datastores](how-to-access-data.md)
3233
* [Create Azure Machine Learning datasets](how-to-create-register-datasets.md)
3334

@@ -39,7 +40,7 @@ For more information about the [Azure Machine Learning Python SDK](/python/api/o
3940

4041
- An Azure Machine Learning workspace. [Create workspace resources](../quickstart-create-resources.md)
4142

42-
- When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace as datastores. They're named `workspaceblobstore` and `workspacefilestore`, respectively. For sufficient blob storage resources, the `workspaceblobstore` is set as the default datastore, already configured for use. If you require more blob storage resources, you need an Azure storage account, with a [supported storage type](how-to-access-data.md#supported-data-storage-service-types).
43+
- When you create a workspace, an Azure blob container and an Azure file share are automatically registered to the workspace as datastores. They're named `workspaceblobstore` and `workspacefilestore`, respectively. For sufficient blob storage resources, the `workspaceblobstore` is set as the default datastore, already configured for use. For more blob storage resources, you need an Azure storage account, with a [supported storage type](how-to-access-data.md#supported-data-storage-service-types).
4344

4445
## Create datastores
4546

@@ -52,15 +53,15 @@ You can create datastores with credential-based access or identity-based access.
5253
Create a new datastore with the Azure Machine Learning studio.
5354

5455
> [!IMPORTANT]
55-
> If your data storage account is located in a virtual network, additional configuration steps are required to ensure that the studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) for more information about the appropriate configuration steps.
56+
> If your data storage account is located in a virtual network, extra configuration steps are required to ensure that the studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) for more information about the appropriate configuration steps.
5657
5758
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
5859
1. Select **Data** on the left pane under **Assets**.
5960
1. At the top, select **Datastores**.
6061
1. Select **+Create**.
61-
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type and authentication type. For more information about where to find the authentication credentials needed to populate this form, visit the [storage access and permissions section](#access-validation).
62+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type and authentication type. For more information about where to find the authentication credentials needed to populate this form, visit the [storage access and permissions section](#access-validation) of this document.
6263

63-
This screenshot shows the **Azure blob datastore** creation panel:
64+
The following screenshot shows the **Azure blob datastore** creation panel:
6465

6566
:::image type="content" source="media/how-to-connect-data-ui/new-datastore-form.png" lightbox="media/how-to-connect-data-ui/new-datastore-form.png" alt-text="Screenshot showing the Azure blob datastore creation panel.":::
6667

@@ -69,25 +70,23 @@ This screenshot shows the **Azure blob datastore** creation panel:
6970
For more information about new datastore creation with the Azure Machine Learning studio, visit [identity-based data access](how-to-identity-based-data-access.md).
7071

7172
> [!IMPORTANT]
72-
> If your data storage account resides in a virtual network, additional configuration steps are required to ensure that Studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) to ensure that the appropriate configuration steps are applied.
73+
> If your data storage account resides in a virtual network, extra configuration steps are required to ensure that Studio can access your data. Visit [Network isolation & privacy](../how-to-enable-studio-virtual-network.md) to ensure that the appropriate configuration steps are applied.
7374
7475
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
7576
1. Select **Data** on the left pane under **Assets**.
7677
1. At the top, select **Datastores**.
7778
1. Select **+Create**.
78-
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. See [which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
79-
1. Customers need to choose the storage acct and container name they want to use
79+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. For more information, visit [which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
80+
1. Customers need to choose the storage account and container name they want to use
8081

81-
Blob reader role (for ADLS Gen 2 and Blob storage) is required; whoever is creating needs permissions to see the contents of the storage
82+
The blob reader role (for ADLS Gen 2 and Blob storage) is required; whoever creates it needs permissions to see the contents of the storage
8283
Reader role of the subscription and resource group
8384
1. Select **No** to not **Save credentials with the datastore for data access**.
8485

85-
This screenshot shows the **Azure blob datastore** creation panel:
86+
The following screenshot shows the **Azure blob datastore** creation panel:
8687

8788
:::image type="content" source="media/how-to-connect-data-ui/new-id-based-datastore-form.png" lightbox="media/how-to-connect-data-ui/new-id-based-datastore-form.png" alt-text="Screenshot showing the Azure blob datastore creation panel.":::
8889

89-
![Form for a new datastore](media/how-to-connect-data-ui/new-id-based-datastore-form.png)
90-
9190
---
9291

9392
## Create data assets
@@ -108,22 +107,24 @@ The following steps describe how to create a dataset in [Azure Machine Learning
108107
109108
1. Navigate to [Azure Machine Learning studio](https://ml.azure.com)
110109

111-
1. Under __Assets__ in the left navigation, select __Data__. On the Data assets tab, select Create
110+
1. Under __Assets__ in the left navigation, select __Data__. On the Data assets tab, select Create, as shown in the following screenshot:
111+
112112
:::image type="content" source="media\how-to-connect-data-ui\data-assets-create.png" lightbox="media/how-to-connect-data-ui/new-id-based-datastore-form.png" alt-text="Screenshot showing Create in the Data assets tab.":::
113113

114-
1. Give the data asset a name and optional description. Then, under **Type**, select a Dataset type, either **File** or **Tabular**.
114+
1. Give the data asset a name and optional description. Then, under **Type**, select a Dataset type, either **File** or **Tabular**, as shown in the following screenshot:
115+
115116
:::image type="content" source="media\how-to-connect-data-ui\create-data-asset-name-type.png" lightbox="media\how-to-connect-data-ui\create-data-asset-name-type.png" alt-text="Screenshot showing the setting of the name, description, and type of the data asset.":::
116117

117-
1. The **Data source** pane opens next, as shown in this screenshot:
118+
1. The **Data source** pane opens next, as shown in the following screenshot:
118119

119120
:::image type="content" source="media\how-to-connect-data-ui\data-assets-source.png" lightbox="media\how-to-connect-data-ui\data-assets-source.png" alt-text="This screenshot showing the data source selection pane.":::
120121

121122
You have different options for your data source. For data already stored in Azure, choose "From Azure storage." To upload data from your local drive, choose "From local files." For data stored at a public web location, choose "From web files." You can also create a data asset from a SQL database, or from [Azure Open Datasets](../../open-datasets/how-to-create-azure-machine-learning-dataset-from-open-dataset.md).
122123

123124
1. At the file selection step, select the location where Azure should store your data, and the data files you want to use.
124-
1. Enable skip validation if your data is in a virtual network. Learn more about [virtual network isolation and privacy](../how-to-enable-studio-virtual-network.md).
125+
1. Enable skip validation if your data is in a virtual network. For more information about virtual network isolation and privacy, visit [this](../how-to-enable-studio-virtual-network.md) resource.
125126

126-
1. Follow the steps to set the data parsing settings and schema for your data asset. The settings prepopulate based on file type, and you can further configure your settings before data asset creation.
127+
1. Follow the steps to set the data parsing settings and schema for your data asset. The settings prepopulate based on file type, and you can further configure your settings before the creation of the data asset.
127128

128129
1. Once you reach the Review step, select Create on the last page
129130

@@ -132,16 +133,21 @@ You have different options for your data source. For data already stored in Azur
132133
After you create your dataset, verify that you can view the preview and profile in the studio:
133134

134135
1. Sign in to the [Azure Machine Learning studio](https://ml.azure.com/)
135-
1. Under __Assets__ in the left navigation, select __Data__.
136+
1. Under __Assets__ in the left navigation, select __Data__ as shown in the following screenshot:
137+
136138
:::image type="content" source="media\how-to-connect-data-ui\data-data-assets.png" alt-text="Screenshot highlights Create in the Data assets tab.":::
137-
1. Select the name of the dataset you want to view.
139+
140+
1. Select the name of the dataset you want to view.
138141
1. Select the **Explore** tab.
139-
1. Select the **Preview** tab.
142+
1. Select the **Preview** tab, as shown in the following screenshot:
143+
140144
:::image type="content" source="media\how-to-connect-data-ui\explore-preview-dataset.png" alt-text="Screenshot shows a preview of a dataset.":::
141-
1. Select the **Profile** tab.
145+
146+
1. Select the **Profile** tab, as shown in the following screenshot:
147+
142148
:::image type="content" source="media\how-to-connect-data-ui\explore-generate-profile.png" alt-text="Screenshot shows dataset column metadata in the Profile tab.":::
143149

144-
You can use summary statistics across your data set to verify whether your data set is ML-ready. For non-numeric columns, these statistics include only basic statistics - for example, min, max, and error count. Numeric columns offer statistical moments and estimated quantiles.
150+
To verify whether your data set is ML-ready, you can use summary statistics across your data set. For non-numeric columns, these statistics include only basic statistical measures - for example, min, max, and error count. Numeric columns offer statistical moments and estimated quantiles.
145151

146152
The Azure Machine Learning dataset data profile includes:
147153

@@ -171,12 +177,12 @@ To ensure that you securely connect to your Azure storage service, Azure Machine
171177

172178
### Virtual network
173179

174-
If your data storage account is in a **virtual network**, extra configuration steps are required to ensure that Azure Machine Learning has access to your data. See [Use Azure Machine Learning studio in a virtual network](../how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied when you create and register your datastore.
180+
If your data storage account is in a **virtual network**, extra configuration steps are required to ensure that Azure Machine Learning has access to your data. Visit [Use Azure Machine Learning studio in a virtual network](../how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied when you create and register your datastore.
175181

176182
### Access validation
177183

178184
> [!WARNING]
179-
> Cross-tenant access to storage accounts is not supported. If your scenario needs cross-tenant access, please reach out to the Azure Machine Learning Data Support team alias at [email protected] for assistance with a custom code solution.
185+
> Cross-tenant access to storage accounts isn't supported. If your scenario needs cross-tenant access, reach out to the ([Azure Machine Learning Data Support team](mailto:[email protected])) for assistance with a custom code solution.
180186
181187
**As part of the initial datastore creation and registration process**, Azure Machine Learning automatically validates that the underlying storage service exists and that the user-provided principal (username, service principal, or SAS token) has access to the specified storage.
182188

@@ -187,7 +193,7 @@ To authenticate your access to the underlying storage service, provide either yo
187193
You can find account key, SAS token, and service principal information at your [Azure portal](https://portal.azure.com).
188194

189195
* To obtain an account key for authentication, select **Storage Accounts** in the left pane, and choose the storage account that you want to register
190-
* The **Overview** page provides information such as the account name, container, and file share name.
196+
* The **Overview** page provides information such as the account name, container, and file share name
191197
* Expand the **Security + networking** node in the left nav
192198
* Select **Access keys**
193199
* The available key values serve as **Account key** values
@@ -196,12 +202,12 @@ You can find account key, SAS token, and service principal information at your [
196202
* Select **Shared access signature**
197203
* Complete the process to generate the SAS value
198204

199-
* To use a [service principal](/azure/active-directory/develop/howto-create-service-principal-portal) for authentication, go to your **App registrations** and select which app you want to use.
200-
* Its corresponding **Overview** page contains required information like tenant ID and client ID.
205+
* To use a [service principal](/azure/active-directory/develop/howto-create-service-principal-portal) for authentication, go to your **App registrations** and select which app you want to use
206+
* Its corresponding **Overview** page contains required information like tenant ID and client ID
201207

202208
> [!IMPORTANT]
203209
> * To change your access keys for an Azure Storage account (account key or SAS token), be sure to sync the new credentials with both your workspace and the datastores connected to it. For more information, visit [sync your updated credentials](../how-to-change-storage-access-key.md).
204-
> * If you unregister and then re-register a datastore with the same name, and that re-registration fails, the Azure Key Vault for your workspace may not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace, but it may not be enabled if you used an existing key vault or have a workspace created prior to October 2020. For more information about how to enable soft-delete, visit [Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
210+
> * If you unregister and then re-register a datastore with the same name, and that re-registration fails, the Azure Key Vault for your workspace might not have soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace. However, it might not be enabled if you used an existing key vault or have a workspace created before October 2020. For more information about how to enable soft-delete, visit [Turn on Soft Delete for an existing key vault](/azure/key-vault/general/soft-delete-change#turn-on-soft-delete-for-an-existing-key-vault).
205211
206212
### Permissions
207213

@@ -217,7 +223,5 @@ Use your datasets in your machine learning experiments for training ML models. [
217223
## Next steps
218224

219225
* [A step-by-step example of training with TabularDatasets and automated machine learning](../tutorial-first-experiment-automated-ml.md)
220-
221226
* [Train a model](how-to-set-up-training-targets.md)
222-
223227
* For more dataset training examples, see the [sample notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/work-with-data/)

0 commit comments

Comments
 (0)