Skip to content

Commit 8e2c722

Browse files
Merge pull request #273160 from SturgeonMi/patch-31
Update how-to-administrate-data-authentication.md
2 parents f89647d + 4d37bad commit 8e2c722

File tree

1 file changed

+29
-18
lines changed

1 file changed

+29
-18
lines changed

articles/machine-learning/how-to-administrate-data-authentication.md

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -24,16 +24,20 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
2424
> This article is intended for Azure administrators who want to create the required infrastructure for an Azure Machine Learning solution.
2525
2626
## Credential-based data authentication
27-
In general, credential-based data authentication from studio involves these checks:
28-
* Does the user who is accessing data from the credential-based datastore have been assigned a RBAC role containing `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`?
27+
In general, credential-based data authentication involves these checks:
28+
* Does the user who is accessing data from the credential-based datastore have been assigned an RBAC role containing `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`?
2929
- This permission is required to retrieve credentials from the datastore on behalf of the user.
30+
- Built in roles that contain this permission already is the [Contributor](../role-based-access-control/built-in-roles/general.md#contributor), the Azure AI Developer or the [AML Data Scientist](../role-based-access-control/built-in-roles/ai-machine-learning.md#azureml-data-scientist) roles. Alternatively, if a custom role is being applied then we need to ensure that this permission is added to that custom role.
31+
- You must know *which* specific user is trying to access the data. It can be a real user with user identity or a compute with compute MSI etc., you can check the section [Scenarios and authentication options](#scenarios-and-authentication-options) to identify what is the identity that you need to add permission for.
32+
3033
* Does the stored credential (service principal, account key, or sas token) have access to the data resource?
3134

35+
3236
## Identity-based data authentication
33-
In general, identity-based data authentication from studio involves these checks:
37+
In general, identity-based data authentication involves these checks:
3438

3539
* Which user wants to access the resources?
36-
- Depending on the conext the data is being accessed, different types of authentication are available, for example
40+
- Depending on the conext when the data is being accessed, different types of authentication are available, for example
3741
- user identity
3842
- compute managed identity
3943
- workspace managed identity
@@ -49,15 +53,16 @@ In general, identity-based data authentication from studio involves these checks
4953
- The storage account [Reader](../role-based-access-control/built-in-roles.md#reader) reads the storage metadata.
5054
- The [Storage Blob Data Contributor](../role-based-access-control/built-in-roles.md#storage-blob-data-contributor) reads, writes, and deletes Azure Storage containers and blobs.
5155
- Please find more [Azure built-in roles for storage here](../role-based-access-control/built-in-roles/storage.md).
52-
53-
## Other general checks for authetication
56+
57+
58+
## Other general checks for authentication
5459
* Where does the access come from?
5560
- User: Is the client IP address in the VNet/subnet range?
5661
- Workspace: Is the workspace public, or does it have a private endpoint in a VNet/subnet?
5762
- Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
5863
* What operation will be performed?
5964
- Azure Machine Learning handles create, read, update, and delete (CRUD) operations on a data store/dataset.
60-
- Archive operations on data assets in the Studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
65+
- Archive operations on data assets in the studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
6166
- Data Access calls (for example, preview or schema) go to the underlying storage, and need extra permissions.
6267
* Will this operation run in your Azure subscription compute resources, or resources hosted in a Microsoft subscription?
6368
- All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
@@ -67,25 +72,31 @@ This diagram shows the general flow of a data access call. Here, a user tries to
6772

6873
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
6974

70-
## Scenarios and identities
75+
## Scenarios and authentication options
7176

7277
This table lists the identities to use for specific scenarios:
7378

74-
| Scenario | Use workspace</br>Managed Service Identity (MSI) | Identity to use |
75-
|--|--|--|
76-
| Access from UI | Yes | Workspace MSI |
77-
| Access from UI | No | User's Identity |
78-
| Access from Job | Yes/No | Compute MSI |
79-
| Access from Notebook | Yes/No | User's identity |
79+
| Configuration | SDK Local/Notebook VM | Job | Dataset Preview | Datastore Browse |
80+
| -- | -- | -- | -- | -- |
81+
| Credential + Workspace MSI | Credential | Credential | Workspace MSI | Credential (Only Account key and SAS token) |
82+
| No Credential + Workspace MSI | Compute MSI/User Identity | Compute MSI/User identity | Workspace MSI | User identity |
83+
| Credential + No Workspace MSI | Credential | Credential | Credential(Not supported for Dataset Preview under private network) | Credential (Only Account key and SAS token) |
84+
| No Credential + No Workspace MSI | Compute MSI/User Identity | Compute MSI/User identity | User Identity | User Identity |
85+
86+
For SDK V1, data authentication in a job is always using compute MSI. And for SDK V2, data authentication in a job depends on the job setting: can be user identity or compute MSI based on your setting.
8087

81-
Data access is complex and it involves many pieces. For example, data access from Azure Machine Learning studio is different compared to use of the SDK for data access. When you use the SDK in your local development environment, you directly access data in the cloud. When you use studio, you don't always directly access the data store from your client. Studio relies on the workspace to access data on your behalf.
8288

8389
> [!TIP]
8490
> To access data from outside Azure Machine Learning, for example with Azure Storage Explorer, that access probably relies on the *user* identity. For specific information, review the documentation for the tool or service you're using. For more information about how Azure Machine Learning works with data, see [Setup authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
8591
86-
## Azure Storage Account
8792

88-
When you use an Azure Storage Account from Azure Machine Learning studio, you must add the managed identity of the workspace to these Azure RBAC roles for the storage account:
93+
## VNET specific requirements
94+
95+
The following will help you set up data authentication to access data behind VNET from an Azure Machine Learning workspace.
96+
97+
### Add permissions of Azure Storage Account to Azure Machine Learning workspace managed identity
98+
99+
When you use an Azure Storage Account from Azure Machine Learning studio, if you want to see Dataset Preview, you must enable "Use workspace managed identity for data preview and profiling in Azure Machine Learning studio" in datastore setting, and add these Azure RBAC roles of the storage account to the workspace managed identity:
89100

90101
* [Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader)
91102
* If the storage account uses a private endpoint to connect to the VNet, you must grant the [Reader](../role-based-access-control/built-in-roles.md#reader) role for the storage account private endpoint to the managed identity.
@@ -100,7 +111,7 @@ To secure communication between Azure Machine Learning and Azure Storage Account
100111

101112
### Azure Storage firewall
102113

103-
When an Azure Storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio, your client doesn't connect to the storage account. The Azure Machine Learning service that makes the request connects to the storage account. The IP address of the service isn't documented, and it changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
114+
When an Azure Storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio, your client doesn't connect to the storage account. The Azure Machine Learning service that makes the request connect to the storage account. The IP address of the service isn't documented, and it changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
104115

105116
### Azure Storage endpoint type
106117

0 commit comments

Comments
 (0)