You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-administrate-data-authentication.md
+29-18Lines changed: 29 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,16 +24,20 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
24
24
> This article is intended for Azure administrators who want to create the required infrastructure for an Azure Machine Learning solution.
25
25
26
26
## Credential-based data authentication
27
-
In general, credential-based data authentication from studio involves these checks:
28
-
* Does the user who is accessing data from the credential-based datastore have been assigned a RBAC role containing `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`?
27
+
In general, credential-based data authentication involves these checks:
28
+
* Does the user who is accessing data from the credential-based datastore have been assigned an RBAC role containing `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`?
29
29
- This permission is required to retrieve credentials from the datastore on behalf of the user.
30
+
- Built in roles that contain this permission already is the [Contributor](../role-based-access-control/built-in-roles/general.md#contributor), the Azure AI Developer or the [AML Data Scientist](../role-based-access-control/built-in-roles/ai-machine-learning.md#azureml-data-scientist) roles. Alternatively, if a custom role is being applied then we need to ensure that this permission is added to that custom role.
31
+
- You must know *which* specific user is trying to access the data. It can be a real user with user identity or a compute with compute MSI etc., you can check the section [Scenarios and authentication options](#scenarios-and-authentication-options) to identify what is the identity that you need to add permission for.
32
+
30
33
* Does the stored credential (service principal, account key, or sas token) have access to the data resource?
31
34
35
+
32
36
## Identity-based data authentication
33
-
In general, identity-based data authentication from studio involves these checks:
37
+
In general, identity-based data authentication involves these checks:
34
38
35
39
* Which user wants to access the resources?
36
-
- Depending on the conext the data is being accessed, different types of authentication are available, for example
40
+
- Depending on the conext when the data is being accessed, different types of authentication are available, for example
37
41
- user identity
38
42
- compute managed identity
39
43
- workspace managed identity
@@ -49,15 +53,16 @@ In general, identity-based data authentication from studio involves these checks
49
53
- The storage account [Reader](../role-based-access-control/built-in-roles.md#reader) reads the storage metadata.
50
54
- The [Storage Blob Data Contributor](../role-based-access-control/built-in-roles.md#storage-blob-data-contributor) reads, writes, and deletes Azure Storage containers and blobs.
51
55
- Please find more [Azure built-in roles for storage here](../role-based-access-control/built-in-roles/storage.md).
52
-
53
-
## Other general checks for authetication
56
+
57
+
58
+
## Other general checks for authentication
54
59
* Where does the access come from?
55
60
- User: Is the client IP address in the VNet/subnet range?
56
61
- Workspace: Is the workspace public, or does it have a private endpoint in a VNet/subnet?
57
62
- Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
58
63
* What operation will be performed?
59
64
- Azure Machine Learning handles create, read, update, and delete (CRUD) operations on a data store/dataset.
60
-
- Archive operations on data assets in the Studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
65
+
- Archive operations on data assets in the studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
61
66
- Data Access calls (for example, preview or schema) go to the underlying storage, and need extra permissions.
62
67
* Will this operation run in your Azure subscription compute resources, or resources hosted in a Microsoft subscription?
63
68
- All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
@@ -67,25 +72,31 @@ This diagram shows the general flow of a data access call. Here, a user tries to
67
72
68
73
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
69
74
70
-
## Scenarios and identities
75
+
## Scenarios and authentication options
71
76
72
77
This table lists the identities to use for specific scenarios:
73
78
74
-
| Scenario | Use workspace</br>Managed Service Identity (MSI) | Identity to use |
75
-
|--|--|--|
76
-
| Access from UI | Yes | Workspace MSI |
77
-
| Access from UI | No | User's Identity |
78
-
| Access from Job | Yes/No | Compute MSI |
79
-
| Access from Notebook | Yes/No | User's identity |
| Credential + No Workspace MSI | Credential | Credential | Credential(Not supported for Dataset Preview under private network) | Credential (Only Account key and SAS token) |
84
+
| No Credential + No Workspace MSI | Compute MSI/User Identity | Compute MSI/User identity | User Identity | User Identity |
85
+
86
+
For SDK V1, data authentication in a job is always using compute MSI. And for SDK V2, data authentication in a job depends on the job setting: can be user identity or compute MSI based on your setting.
80
87
81
-
Data access is complex and it involves many pieces. For example, data access from Azure Machine Learning studio is different compared to use of the SDK for data access. When you use the SDK in your local development environment, you directly access data in the cloud. When you use studio, you don't always directly access the data store from your client. Studio relies on the workspace to access data on your behalf.
82
88
83
89
> [!TIP]
84
90
> To access data from outside Azure Machine Learning, for example with Azure Storage Explorer, that access probably relies on the *user* identity. For specific information, review the documentation for the tool or service you're using. For more information about how Azure Machine Learning works with data, see [Setup authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
85
91
86
-
## Azure Storage Account
87
92
88
-
When you use an Azure Storage Account from Azure Machine Learning studio, you must add the managed identity of the workspace to these Azure RBAC roles for the storage account:
93
+
## VNET specific requirements
94
+
95
+
The following will help you set up data authentication to access data behind VNET from an Azure Machine Learning workspace.
96
+
97
+
### Add permissions of Azure Storage Account to Azure Machine Learning workspace managed identity
98
+
99
+
When you use an Azure Storage Account from Azure Machine Learning studio, if you want to see Dataset Preview, you must enable "Use workspace managed identity for data preview and profiling in Azure Machine Learning studio" in datastore setting, and add these Azure RBAC roles of the storage account to the workspace managed identity:
89
100
90
101
*[Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader)
91
102
* If the storage account uses a private endpoint to connect to the VNet, you must grant the [Reader](../role-based-access-control/built-in-roles.md#reader) role for the storage account private endpoint to the managed identity.
@@ -100,7 +111,7 @@ To secure communication between Azure Machine Learning and Azure Storage Account
100
111
101
112
### Azure Storage firewall
102
113
103
-
When an Azure Storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio, your client doesn't connect to the storage account. The Azure Machine Learning service that makes the request connects to the storage account. The IP address of the service isn't documented, and it changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
114
+
When an Azure Storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio, your client doesn't connect to the storage account. The Azure Machine Learning service that makes the request connect to the storage account. The IP address of the service isn't documented, and it changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
0 commit comments