Skip to content

Commit 71f3945

Browse files
authored
Merge pull request #185167 from nibaccam/id-access
Data | Id access datastores via UI
2 parents 1c16f06 + d88f958 commit 71f3945

File tree

3 files changed

+36
-12
lines changed

3 files changed

+36
-12
lines changed

articles/machine-learning/how-to-connect-data-ui.md

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
99
ms.author: yogipandey
1010
author: ynpandey
1111
ms.reviewer: nibaccam
12-
ms.date: 10/21/2021
12+
ms.date: 01/18/2021
1313
ms.custom: data4ml
1414

1515
# Customer intent: As low code experience data scientist, I need to make my data in storage on Azure available to my remote compute to train my ML models.
@@ -47,6 +47,10 @@ For a code first experience, see the following articles to use the [Azure Machin
4747

4848
You can create datastores from [these Azure storage solutions](how-to-access-data.md#matrix). **For unsupported storage solutions**, and to save data egress cost during ML experiments, you must [move your data](how-to-access-data.md#move) to a supported Azure storage solution. [Learn more about datastores](how-to-access-data.md).
4949

50+
You can create datastores with credential-based access or identity-based access.
51+
52+
# [Credential-based](#tab/credential)
53+
5054
Create a new datastore in a few steps with the Azure Machine Learning studio.
5155

5256
> [!IMPORTANT]
@@ -61,13 +65,31 @@ The following example demonstrates what the form looks like when you create an *
6165

6266
![Form for a new datastore](media/how-to-connect-data-ui/new-datastore-form.png)
6367

68+
# [Identity-based](#tab/identity)
69+
70+
Create a new datastore in a few steps with the Azure Machine Learning studio. Learn more about [identity-based data access](how-to-identity-based-data-access.md).
71+
72+
> [!IMPORTANT]
73+
> If your data storage account is in a virtual network, additional configuration steps are required to ensure the studio has access to your data. See [Network isolation & privacy](how-to-enable-studio-virtual-network.md) to ensure the appropriate configuration steps are applied.
74+
75+
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
76+
1. Select **Datastores** on the left pane under **Manage**.
77+
1. Select **+ New datastore**.
78+
1. Complete the form to create and register a new datastore. The form intelligently updates itself based on your selections for Azure storage type. See [which storage types support identity-based](how-to-identity-based-data-access.md#storage-access-permissions) data access.
79+
1. Select **No** to not **Save credentials with the datastore for data access**.
80+
81+
The following example demonstrates what the form looks like when you create an **Azure blob datastore**:
82+
83+
![Form for a new datastore](media/how-to-connect-data-ui/new-id-based-datastore-form.png)
84+
85+
---
86+
6487
## Create datasets
6588

6689
After you create a datastore, create a dataset to interact with your data. Datasets package your data into a lazily evaluated consumable object for machine learning tasks, like training. [Learn more about datasets](how-to-create-register-datasets.md).
6790

6891
There are two types of datasets, FileDataset and TabularDataset.
69-
[FileDatasets](how-to-create-register-datasets.md#filedataset) create references to single or multiple files or public URLs. Whereas,
70-
[TabularDatasets](how-to-create-register-datasets.md#tabulardataset) represent your data in a tabular format. You can create TabularDatasets from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
92+
[FileDatasets](how-to-create-register-datasets.md#filedataset) create references to single or multiple files or public URLs. Whereas [TabularDatasets](how-to-create-register-datasets.md#tabulardataset) represent your data in a tabular format. You can create TabularDatasets from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
7193

7294
The following steps and animation show how to create a dataset in [Azure Machine Learning studio](https://ml.azure.com).
7395

@@ -87,7 +109,7 @@ To create a dataset in the studio:
87109
1. Select **Next** to populate the **Settings and preview** and **Schema** forms; they are intelligently populated based on file type and you can further configure your dataset prior to creation on these forms.
88110
1. On the Settings and preview form, you can indicate if your data contains multi-line data.
89111
1. On the Schema form, you can specify that your TabularDataset has a time component by selecting type: **Timestamp** for your date or time column.
90-
1. If your data is formatted into subsets, for example time windows, and you want to use those subsets for training, select type **Partition timestamp**. Doing so enables timeseries operations on your dataset. Learn more about how to [leverage partitions in your dataset for training](how-to-monitor-datasets.md?tabs=azure-studio#create-target-dataset).
112+
1. If your data is formatted into subsets, for example time windows, and you want to use those subsets for training, select type **Partition timestamp**. Doing so enables time series operations on your dataset. Learn more about how to [leverage partitions in your dataset for training](how-to-monitor-datasets.md?tabs=azure-studio#create-target-dataset).
91113
1. Select **Next** to review the **Confirm details** form. Check your selections and create an optional data profile for your dataset. Learn more about [data profiling](#profile).
92114
1. Select **Create** to complete your dataset creation.
93115

articles/machine-learning/how-to-identity-based-data-access.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,27 @@ ms.topic: how-to
88
ms.author: yogipandey
99
author: ynpandey
1010
ms.reviewer: nibaccam
11-
ms.date: 10/21/2021
11+
ms.date: 01/18/2021
1212
ms.custom: contperf-fy21q1, devx-track-python, data4ml
1313

14-
# Customer intent: As an experienced Python developer, I need to make my data in Azure Storage available to my compute to train my machine learning models.
14+
# Customer intent: As an experienced Python developer, I need to make my data in Azure Storage available to my compute for training my machine learning models.
1515
---
1616

1717
# Connect to storage by using identity-based data access
1818

1919
In this article, you learn how to connect to storage services on Azure by using identity-based data access and Azure Machine Learning datastores via the [Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/intro).
2020

21-
Typically, datastores use credential-based data access to confirm you have permission to access the storage service. They keep connection information, like your subscription ID and token authorization, in the [key vault](https://azure.microsoft.com/services/key-vault/) that's associated with the workspace. When you create a datastore that uses identity-based data access, your Azure account ([Azure Active Directory token](../active-directory/fundamentals/active-directory-whatis.md)) is used to confirm you have permission to access the storage service. In this scenario, no authentication credentials are saved. Only the storage account information is stored in the datastore.
21+
Typically, datastores use **credential-based authentication** to confirm you have permission to access the storage service. They keep connection information, like your subscription ID and token authorization, in the [key vault](https://azure.microsoft.com/services/key-vault/) that's associated with the workspace. When you create a datastore that uses **identity-based data access**, your Azure account ([Azure Active Directory token](../active-directory/fundamentals/active-directory-whatis.md)) is used to confirm you have permission to access the storage service. In the **identity-based data access** scenario, no authentication credentials are saved. Only the storage account information is stored in the datastore.
22+
23+
To create datastores with **identity-based** data access via the Azure Machine Learning studio UI, see [Connect to data with the Azure Machine Learning studio](how-to-connect-data-ui.md#create-datastores).
2224

23-
To create datastores that use credential-based authentication, like access keys or service principals, see [Connect to storage services on Azure](how-to-access-data.md).
25+
To create datastores that use **credential-based** authentication, like access keys or service principals, see [Connect to storage services on Azure](how-to-access-data.md).
2426

2527
## Identity-based data access in Azure Machine Learning
2628

2729
There are two scenarios in which you can apply identity-based data access in Azure Machine Learning. These scenarios are a good fit for identity-based access when you're working with confidential data and need more granular data access management:
28-
> [!IMPORTANT]
30+
31+
> [!WARNING]
2932
> Identity-based data access is not supported for [automated ML experiments](how-to-configure-auto-train.md).
3033
3134
- Accessing storage services
@@ -49,8 +52,7 @@ The same behavior applies when you:
4952
5053
### Model training on private data
5154

52-
Certain machine learning scenarios involve training models with private data. In such cases, data scientists need to run training workflows without being exposed to the confidential input data. In this scenario, a managed identity of the training compute is used for data access authentication. This approach allows storage admins to grant Storage Blob Data Reader access to the managed identity that the training compute uses to run the training job. The individual data scientists don't need to be granted access. For more information, see [Set up managed identity on a compute cluster](how-to-create-attach-compute-cluster.md#managed-identity).
53-
55+
Certain machine learning scenarios involve training models with private data. In such cases, data scientists need to run training workflows without being exposed to the confidential input data. In this scenario, a [managed identity](how-to-use-managed-identities.md) of the training compute is used for data access authentication. This approach allows storage admins to grant Storage Blob Data Reader access to the managed identity that the training compute uses to run the training job. The individual data scientists don't need to be granted access. For more information, see [Set up managed identity on a compute cluster](how-to-create-attach-compute-cluster.md#managed-identity).
5456

5557
## Prerequisites
5658

@@ -72,7 +74,7 @@ Certain machine learning scenarios involve training models with private data. In
7274

7375
To help ensure that you securely connect to your storage service on Azure, Azure Machine Learning requires that you have permission to access the corresponding data storage.
7476

75-
Identity-based data access supports connections to only the following storage services:
77+
Identity-based data access supports connections to **only** the following storage services.
7678

7779
* Azure Blob Storage
7880
* Azure Data Lake Storage Gen1
67 KB
Loading

0 commit comments

Comments
 (0)