Skip to content

Commit e29947b

Browse files
authored
Merge pull request #170 from fbsolo-ms1/document-freshness-maintenance
Freshness update for how-to-administrate-data-authentication.md . . .
2 parents ac790c4 + b4bc0f9 commit e29947b

File tree

2 files changed

+1183
-30
lines changed

2 files changed

+1183
-30
lines changed

articles/machine-learning/how-to-administrate-data-authentication.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
title: Administer data authentication
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to manage data access and how to authenticate in Azure Machine Learning.
4+
description: Learn how to manage data access and how to handle authentication operations in Azure Machine Learning.
55
services: machine-learning
66
ms.service: azure-machine-learning
77
ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: franksolomon
1010
author: fbsolo-ms1
1111
ms.reviewer: xunwan
12-
ms.date: 09/26/2023
12+
ms.date: 09/06/2024
1313
ms.custom: engagement-fy23
1414

1515
# Customer intent: As an administrator, I need to administer data access and set up authentication methods for data scientists.
@@ -26,10 +26,17 @@ Learn how to manage data access and how to authenticate in Azure Machine Learnin
2626
## Credential-based data authentication
2727

2828
In general, credential-based data authentication involves these checks:
29-
* Has the user who is accessing data from the credential-based datastore been assigned a role with role-based access control (RBAC) that contains `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`?
29+
* Check that the user who accesses data from the credential-based datastore has an assigned role with role-based access control (RBAC) that contains `Microsoft.MachineLearningServices/workspaces/datastores/listsecrets/action`
30+
3031
- This permission is required to retrieve credentials from the datastore for the user.
31-
- Built-in roles that contain this permission already are [Contributor](/azure/role-based-access-control/built-in-roles/general#contributor), Azure AI Developer, or [Azure Machine Learning Data Scientist](/azure/role-based-access-control/built-in-roles/ai-machine-learning#azureml-data-scientist). Alternatively, if a custom role is applied, you need to ensure that this permission is added to that custom role.
32-
- You must know *which* specific user is trying to access the data. It can be a real user with a user identity or a computer with compute managed identity (MSI). See the section [Scenarios and authentication options](#scenarios-and-authentication-options) to identify the identity for which you need to add permission.
32+
- Built-in roles that already contain this permission:
33+
34+
- [Contributor](/azure/role-based-access-control/built-in-roles/general#contributor)
35+
- Azure AI Developer
36+
- [Azure Machine Learning Data Scientist](/azure/role-based-access-control/built-in-roles/ai-machine-learning#azureml-data-scientist)
37+
- Alternatively, if a custom role is applied, this permission must be added to that custom role
38+
39+
- You must know *which* specific user wants to access the data. A specific user can be a real user with a user identity. It can also be a computer with compute managed identity (MSI). For more information, visit the [Scenarios and authentication options](#scenarios-and-authentication-options) section to determine the identity that needs the added permission.
3340

3441
* Does the stored credential (service principal, account key, or shared access signature token) have access to the data resource?
3542

@@ -38,44 +45,49 @@ In general, credential-based data authentication involves these checks:
3845
In general, identity-based data authentication involves these checks:
3946

4047
* Which user wants to access the resources?
41-
- Depending on the context when the data is being accessed, different types of authentication are available, for example:
48+
- Different types of authentication are available, depending on the context at the time the data is accessed. For example:
4249
- User identity
4350
- Compute managed identity
4451
- Workspace managed identity
45-
- Jobs, including the dataset `Generate Profile` option, run on a compute resource in *your subscription*, and access the data from that location. The compute managed identity needs permission to the storage resource, instead of the identity of the user who submitted the job.
46-
- For authentication based on a user identity, you must know *which* specific user tried to access the storage resource. For more information about *user* authentication, see [Authentication for Azure Machine Learning](how-to-setup-authentication.md). For more information about service-level authentication, see [Authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
47-
* Does this user have permission for reading?
52+
- Jobs, including the dataset `Generate Profile` option, run on a compute resource in *your subscription*, and access the data from that location. The compute managed identity needs permission to access the storage resource, instead of the identity of the user who submitted the job.
53+
- For authentication based on a user identity, you must know *which* specific user tried to access the storage resource. For more information about *user* authentication, visit [Authentication for Azure Machine Learning](how-to-setup-authentication.md). For more information about service-level authentication, visit [Authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
54+
* Does this user have read permission for the resource?
4855
- Does the user identity or the compute managed identity have the necessary permissions for that storage resource? Permissions are granted by using Azure RBAC.
4956
- The storage account [Reader](/azure/role-based-access-control/built-in-roles#reader) reads the storage metadata.
5057
- The [Storage Blob Data Reader](/azure/role-based-access-control/built-in-roles#storage-blob-data-reader) reads and lists storage containers and blobs.
51-
- For more information, see [Azure built-in roles for storage](/azure/role-based-access-control/built-in-roles/storage).
52-
* Does this user have permission for writing?
58+
- For more information, visit [Azure built-in roles for storage](/azure/role-based-access-control/built-in-roles/storage).
59+
* Does this user have write permission for the resource?
5360
- Does the user identity or the compute managed identity have the necessary permissions for that storage resource? Permissions are granted by using Azure RBAC.
5461
- The storage account [Reader](/azure/role-based-access-control/built-in-roles#reader) reads the storage metadata.
5562
- The [Storage Blob Data Contributor](/azure/role-based-access-control/built-in-roles#storage-blob-data-contributor) reads, writes, and deletes Azure Storage containers and blobs.
56-
- For more information, see [Azure built-in roles for storage](/azure/role-based-access-control/built-in-roles/storage).
63+
- For more information, visit [Azure built-in roles for storage](/azure/role-based-access-control/built-in-roles/storage).
5764

5865
## Other general checks for authentication
5966

60-
* Where does the access come from?
67+
* What exactly will access the resource?
6168
- **User**: Is the client IP address in the virtual network/subnet range?
6269
- **Workspace**: Is the workspace public, or does it have a private endpoint in a virtual network/subnet?
6370
- **Storage**: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
64-
* What operation will be performed?
65-
- Azure Machine Learning handles create, read, update, and delete (CRUD) operations on a data store/dataset.
71+
* What is the planned operation?
72+
- Azure Machine Learning handles
73+
- **C**reate
74+
- **R**ead
75+
- **U**pdate
76+
- **D**elete
77+
(CRUD) operations on a data store/dataset.
6678
- Archive operations on data assets in Azure Machine Learning studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
67-
- Data access calls (for example, preview or schema) go to the underlying storage and need extra permissions.
68-
* Will this operation run in your Azure subscription compute resources or resources hosted in a Microsoft subscription?
79+
- Data access calls (for example, preview or schema) go to the underlying storage and require extra permissions.
80+
* Will this operation run in an Azure subscription compute resources, or resources hosted in a Microsoft subscription?
6981
- All calls to dataset and datastore services (except the `Generate Profile` option) use resources hosted in a *Microsoft subscription* to run the operations.
7082
- Jobs, including the dataset `Generate Profile` option, run on a compute resource in *your subscription* and access the data from that location. The compute identity needs permission to the storage resource, instead of the identity of the user who submitted the job.
7183

72-
This diagram shows the general flow of a data access call. Here, a user tries to make a data access call through a Machine Learning workspace, without using a compute resource.
84+
This diagram shows the general flow of a data access call. Here, a user tries to make a data access call through a Machine Learning workspace, without use of a compute resource.
7385

74-
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram that shows the logic flow when accessing data.":::
86+
:::image type="content" source="./media/how-to-administrate-data-authentication/data-access-flow.svg" alt-text="Diagram that shows the logic flow when accessing data.":::
7587

7688
## Scenarios and authentication options
7789

78-
This table lists the identities to use for specific scenarios.
90+
This table lists the identities to use for specific scenarios:
7991

8092
| Configuration | SDK local/notebook virtual machine | Job | Dataset Preview | Datastore browse |
8193
| -- | -- | -- | -- | -- |
@@ -84,51 +96,51 @@ This table lists the identities to use for specific scenarios.
8496
| Credential + No Workspace MSI | Credential | Credential | Credential (not supported for Dataset Preview under private network) | Credential (only account key and shared access signature token) |
8597
| No Credential + No Workspace MSI | Compute MSI/User identity | Compute MSI/User identity | User identity | User identity |
8698

87-
For SDK V1, data authentication in a job always uses compute MSI. For SDK V2, data authentication in a job depends on the job setting. It can be user identity or compute MSI based on your setting.
99+
For SDK V1, data authentication in a job always uses compute MSI. For SDK V2, data authentication in a job depends on your job setting. It can be user identity or compute MSI, based on that job setting.
88100

89101
> [!TIP]
90-
> To access data from outside Machine Learning, for example, with Azure Storage Explorer, that access probably relies on the *user* identity. For specific information, review the documentation for the tool or service you're using. For more information about how Machine Learning works with data, see [Set up authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
102+
> To access data from outside Machine Learning - for example, with Azure Storage Explorer - that access probably relies on the *user* identity. For specific information, review the documentation for the tool or service you plan to use. For more information about how Machine Learning works with data, visit [Set up authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
91103
92104
## Virtual network specific requirements
93105

94-
The following information helps you set up data authentication to access data behind a virtual network from a Machine Learning workspace.
106+
This information helps you set up data authentication from a Machine Learning workspace, to access data behind a virtual network.
95107

96108
### Add permissions of a storage account to a Machine Learning workspace managed identity
97109

98-
When you use a storage account from the studio, if you want to see Dataset Preview, you must enable **Use workspace managed identity for data preview and profiling in Azure Machine Learning studio** in the datastore setting. Then add the following Azure RBAC roles of the storage account to the workspace managed identity:
110+
When you use a storage account from the studio, if you want to see Dataset Preview, you must enable **Use workspace managed identity for data preview and profiling in Azure Machine Learning studio** in the datastore setting. Then add these storage account Azure RBAC roles to the workspace managed identity:
99111

100112
* [Blob Data Reader](/azure/role-based-access-control/built-in-roles#storage-blob-data-reader)
101113
* If the storage account uses a private endpoint to connect to the virtual network, you must grant the [Reader](/azure/role-based-access-control/built-in-roles#reader) role for the storage account private endpoint to the managed identity.
102114

103-
For more information, see [Use Azure Machine Learning studio in an Azure virtual network](how-to-enable-studio-virtual-network.md).
115+
For more information, visit [Use Azure Machine Learning studio in an Azure virtual network](how-to-enable-studio-virtual-network.md).
104116

105-
The following sections explain the limitations of using a storage account, with your workspace, in a virtual network.
117+
These sections explain the limitations of using a storage account, with your workspace, in a virtual network.
106118

107119
### Secure communication with a storage account
108120

109121
To secure communication between Machine Learning and storage accounts, configure the storage to [grant access to trusted Azure services](/azure/storage/common/storage-network-security#grant-access-to-trusted-azure-services).
110122

111123
### Azure Storage firewall
112124

113-
When a storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when you use the studio, your client doesn't connect to the storage account. The Machine Learning service that makes the request connects to the storage account. The IP address of the service isn't documented, and it changes frequently. Enabling the storage firewall won't allow the studio to access the storage account in a virtual network configuration.
125+
For a storage account located behind a virtual network, the storage firewall can normally allow your client to directly connect over the internet. However, when you use the studio, your client doesn't connect to the storage account. The Machine Learning service that makes the request connects to the storage account. The IP address of the service isn't documented, and it changes frequently. Enabling the storage firewall doesn't allow the studio to access the storage account in a virtual network configuration.
114126

115127
### Azure Storage endpoint type
116128

117-
When the workspace uses a private endpoint, and the storage account is also in the virtual network, extra validation requirements arise when you use the studio:
129+
When the workspace uses a private endpoint, and the storage account is also in the virtual network, extra validation requirements arise when you use the studio.
118130

119131
* If the storage account uses a *service endpoint*, the workspace private endpoint and storage service endpoint must be located in the same subnet of the virtual network.
120132
* If the storage account uses a *private endpoint*, the workspace private endpoint and storage private endpoint must be in located in the same virtual network. In this case, they can be in different subnets.
121133

122134
## Azure Data Lake Storage Gen1
123135

124-
When you use Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style access control lists. You can assign the workspace's managed identity access to resources, like any other security principal. For more information, see [Access control in Azure Data Lake Storage Gen1](/azure/data-lake-store/data-lake-store-access-control).
136+
When you use Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style access control lists. You can assign the workspace's managed identity access to resources, like any other security principal. For more information, visit [Access control in Azure Data Lake Storage Gen1](/azure/data-lake-store/data-lake-store-access-control).
125137

126138
## Azure Data Lake Storage Gen2
127139

128140
When you use Azure Data Lake Storage Gen2 as a datastore, you can use both Azure RBAC and POSIX-style access control lists (ACLs) to control data access inside a virtual network.
129141

130142
- **To use Azure RBAC**: Follow the steps described in [Datastore: Azure Storage account](how-to-enable-studio-virtual-network.md#datastore-azure-storage-account). Data Lake Storage Gen2 is based on Azure Storage, so the same steps apply when you use Azure RBAC.
131-
- **To use ACLs**: The managed identity of the workspace can be assigned access like any other security principal. For more information, see [Access control lists on files and directories](/azure/storage/blobs/data-lake-storage-access-control#access-control-lists-on-files-and-directories).
143+
- **To use ACLs**: The managed identity of the workspace can be assigned access like any other security principal. For more information, visit [Access control lists on files and directories](/azure/storage/blobs/data-lake-storage-access-control#access-control-lists-on-files-and-directories).
132144

133145
## Next steps
134146

0 commit comments

Comments
 (0)