Skip to content

Commit 43bb71d

Browse files
authored
Merge pull request #252881 from fbsolo-ms1/release-branch-Data-Administration-article
Release branch data administration article
2 parents ea6aed7 + 64e856f commit 43bb71d

File tree

1 file changed

+44
-42
lines changed

1 file changed

+44
-42
lines changed

articles/machine-learning/how-to-administrate-data-authentication.md

Lines changed: 44 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -8,52 +8,56 @@ ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: xunwan
1010
author: SturgeonMi
11-
ms.reviewer: larryfr
12-
ms.date: 01/20/2023
11+
ms.reviewer: franksolomon
12+
ms.date: 09/26/2023
1313
ms.custom: engagement-fy23
1414

1515
# Customer intent: As an administrator, I need to administrate data access and set up authentication method for data scientists.
1616
---
1717

1818
# Data administration
1919

20-
2120
Learn how to manage data access and how to authenticate in Azure Machine Learning
2221
[!INCLUDE [sdk/cli v2](includes/machine-learning-dev-v2.md)]
2322

2423
> [!IMPORTANT]
25-
> The information in this article is intended for Azure administrators who are creating the infrastructure required for an Azure Machine Learning solution.
26-
27-
In general, data access from studio involves the following checks:
28-
29-
* Who is accessing?
30-
- There are multiple different types of authentication depending on the storage type. For example, account key, token, service principal, managed identity, and user identity.
31-
- If authentication is made using a user identity, then it's important to know *which* user is trying to access storage. For more information on authenticating a _user_, see [authentication for Azure Machine Learning](how-to-setup-authentication.md). For more information on service-level authentication, see [authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
32-
* Do they have permission?
33-
- Are the credentials correct? If so, does the service principal, managed identity, etc., have the necessary permissions on the storage? Permissions are granted using Azure role-based access controls (Azure RBAC).
34-
- [Reader](../role-based-access-control/built-in-roles.md#reader) of the storage account reads metadata of the storage.
35-
- [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) reads data within a blob container.
36-
- [Contributor](../role-based-access-control/built-in-roles.md#contributor) allows write access to a storage account.
37-
- More roles may be required depending on the type of storage.
38-
* Where is access from?
24+
> This article is intended for Azure administrators who want to create the required infrastructure for an Azure Machine Learning solution.
25+
26+
In general, data access from studio involves these checks:
27+
28+
* Which user wants to access the resources?
29+
- Depending on the storage type, different types of authentication are available, for example
30+
- account key
31+
- token
32+
- service principal
33+
- managed identity
34+
- user identity
35+
- For authentication based on a user identity, you must know *which* specific user tried to access the storage resource. For more information about _user_ authentication, see [authentication for Azure Machine Learning](how-to-setup-authentication.md). For more information about service-level authentication, see [authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
36+
* Does this user have permission?
37+
- Does the user have the correct credentials? If yes, does the service principal, managed identity, etc., have the necessary permissions for that storage resource? Permissions are granted using Azure role-based access controls (Azure RBAC).
38+
- The storage account [Reader](../role-based-access-control/built-in-roles.md#reader) reads the storage metadata.
39+
- The [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) reads data within a blob container.
40+
- The [Contributor](../role-based-access-control/built-in-roles.md#contributor) allows write access to a storage account.
41+
- More roles may be required, depending on the type of storage.
42+
* Where does the access come from?
3943
- User: Is the client IP address in the VNet/subnet range?
40-
- Workspace: Is the workspace public or does it have a private endpoint in a VNet/subnet?
44+
- Workspace: Is the workspace public, or does it have a private endpoint in a VNet/subnet?
4145
- Storage: Does the storage allow public access, or does it restrict access through a service endpoint or a private endpoint?
42-
* What operation is being performed?
43-
- Create, read, update, and delete (CRUD) operations on a data store/dataset are handled by Azure Machine Learning.
44-
- Archive operation on data assets in the Studio requires the following RBAC operation: Microsoft.MachineLearningServices/workspaces/datasets/registered/delete
45-
- Data Access calls (such as preview or schema) go to the underlying storage and need extra permissions.
46-
* Where is this operation being run; compute resources in your Azure subscription or resources hosted in a Microsoft subscription?
46+
* What operation will be performed?
47+
- Azure Machine Learning handles create, read, update, and delete (CRUD) operations on a data store/dataset.
48+
- Archive operations on data assets in the Studio require this RBAC operation: `Microsoft.MachineLearningServices/workspaces/datasets/registered/delete`
49+
- Data Access calls (for example, preview or schema) go to the underlying storage, and need extra permissions.
50+
* Will this operation run in your Azure subscription compute resources, or resources hosted in a Microsoft subscription?
4751
- All calls to dataset and datastore services (except the "Generate Profile" option) use resources hosted in a __Microsoft subscription__ to run the operations.
48-
- Jobs, including the "Generate Profile" option for datasets, run on a compute resource in __your subscription__, and access the data from there. So the compute identity needs permission to the storage rather than the identity of the user submitting the job.
52+
- Jobs, including the dataset "Generate Profile" option, run on a compute resource in __your subscription__, and access the data from that location. The compute identity needs permission to the storage resource, instead of the identity of the user that submitted the job.
4953

50-
The following diagram shows the general flow of a data access call. In this example, a user is trying to make a data access call through a machine learning workspace, without using any compute resource.
54+
This diagram shows the general flow of a data access call. Here, a user tries to make a data access call through a machine learning workspace, without using a compute resource.
5155

5256
:::image type="content" source="./media/concept-network-data-access/data-access-flow.svg" alt-text="Diagram of the logic flow when accessing data.":::
5357

5458
## Scenarios and identities
5559

56-
The following table lists what identities should be used for specific scenarios:
60+
This table lists the identities to use for specific scenarios:
5761

5862
| Scenario | Use workspace</br>Managed Service Identity (MSI) | Identity to use |
5963
|--|--|--|
@@ -62,51 +66,49 @@ The following table lists what identities should be used for specific scenarios:
6266
| Access from Job | Yes/No | Compute MSI |
6367
| Access from Notebook | Yes/No | User's identity |
6468

65-
66-
Data access is complex and it's important to recognize that there are many pieces to it. For example, accessing data from Azure Machine Learning studio is different than using the SDK. When using the SDK on your local development environment, you're directly accessing data in the cloud. When using studio, you aren't always directly accessing the data store from your client. Studio relies on the workspace to access data on your behalf.
69+
Data access is complex and it involves many pieces. For example, data access from Azure Machine Learning studio is different compared to use of the SDK for data access. When you use the SDK in your local development environment, you directly access data in the cloud. When you use studio, you don't always directly access the data store from your client. Studio relies on the workspace to access data on your behalf.
6770

6871
> [!TIP]
69-
> If you need to access data from outside Azure Machine Learning, such as using Azure Storage Explorer, *user* identity is probably what is used. Consult the documentation for the tool or service you are using for specific information. For more information on how Azure Machine Learning works with data, see [Setup authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
72+
> To access data from outside Azure Machine Learning, for example with Azure Storage Explorer, that access probably relies on the *user* identity. For specific information, review the documentation for the tool or service you're using. For more information about how Azure Machine Learning works with data, see [Setup authentication between Azure Machine Learning and other services](how-to-identity-based-service-authentication.md).
7073
7174
## Azure Storage Account
7275

73-
When using an Azure Storage Account from Azure Machine Learning studio, you must add the managed identity of the workspace to the following Azure RBAC roles for the storage account:
76+
When you use an Azure Storage Account from Azure Machine Learning studio, you must add the managed identity of the workspace to these Azure RBAC roles for the storage account:
7477

7578
* [Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader)
76-
* If the storage account uses a private endpoint to connect to the VNet, you must grant the managed identity the [Reader](../role-based-access-control/built-in-roles.md#reader) role for the storage account private endpoint.
79+
* If the storage account uses a private endpoint to connect to the VNet, you must grant the [Reader](../role-based-access-control/built-in-roles.md#reader) role for the storage account private endpoint to the managed identity.
7780

7881
For more information, see [Use Azure Machine Learning studio in an Azure Virtual Network](how-to-enable-studio-virtual-network.md).
7982

80-
See the following sections for information on limitations when using Azure Storage Account with your workspace in a VNet.
83+
The following sections explain the limitations of using an Azure Storage Account, with your workspace, in a VNet.
8184

82-
### Secure communication with Azure Storage Account
85+
### Secure communication with Azure Storage Account
8386

84-
To secure communication between Azure Machine Learning and Azure Storage Accounts, configure storage to [Grant access to trusted Azure services](../storage/common/storage-network-security.md#grant-access-to-trusted-azure-services).
87+
To secure communication between Azure Machine Learning and Azure Storage Accounts, configure the storage to [Grant access to trusted Azure services](../storage/common/storage-network-security.md#grant-access-to-trusted-azure-services).
8588

8689
### Azure Storage firewall
8790

88-
When an Azure Storage account is behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio it isn't your client that connects to the storage account; it's the Azure Machine Learning service that makes the request. The IP address of the service isn't documented and changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
91+
When an Azure Storage account is located behind a virtual network, the storage firewall can normally be used to allow your client to directly connect over the internet. However, when using studio, your client doesn't connect to the storage account. The Azure Machine Learning service that makes the request connects to the storage account. The IP address of the service isn't documented, and it changes frequently. __Enabling the storage firewall will not allow studio to access the storage account in a VNet configuration__.
8992

9093
### Azure Storage endpoint type
9194

92-
When the workspace uses a private endpoint and the storage account is also in the VNet, there are extra validation requirements when using studio:
95+
When the workspace uses a private endpoint, and the storage account is also in the VNet, extra validation requirements arise when using studio:
9396

94-
* If the storage account uses a __service endpoint__, the workspace private endpoint and storage service endpoint must be in the same subnet of the VNet.
95-
* If the storage account uses a __private endpoint__, the workspace private endpoint and storage private endpoint must be in the same VNet. In this case, they can be in different subnets.
97+
* If the storage account uses a __service endpoint__, the workspace private endpoint and storage service endpoint must be located in the same subnet of the VNet.
98+
* If the storage account uses a __private endpoint__, the workspace private endpoint and storage private endpoint must be in located in the same VNet. In this case, they can be in different subnets.
9699

97100
## Azure Data Lake Storage Gen1
98101

99-
When using Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style access control lists. You can assign the workspace's managed identity access to resources just like any other security principal. For more information, see [Access control in Azure Data Lake Storage Gen1](../data-lake-store/data-lake-store-access-control.md).
102+
When using Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style access control lists. You can assign the workspace's managed identity access to resources, just like any other security principal. For more information, see [Access control in Azure Data Lake Storage Gen1](../data-lake-store/data-lake-store-access-control.md).
100103

101104
## Azure Data Lake Storage Gen2
102105

103106
When using Azure Data Lake Storage Gen2 as a datastore, you can use both Azure RBAC and POSIX-style access control lists (ACLs) to control data access inside of a virtual network.
104107

105-
__To use Azure RBAC__, follow the steps in the [Datastore: Azure Storage Account](how-to-enable-studio-virtual-network.md#datastore-azure-storage-account) section of the 'Use Azure Machine Learning studio in an Azure Virtual Network' article. Data Lake Storage Gen2 is based on Azure Storage, so the same steps apply when using Azure RBAC.
108+
__To use Azure RBAC__, follow the steps described in this [Datastore: Azure Storage Account](how-to-enable-studio-virtual-network.md#datastore-azure-storage-account) article section. Data Lake Storage Gen2 is based on Azure Storage, so the same steps apply when using Azure RBAC.
106109

107110
__To use ACLs__, the managed identity of the workspace can be assigned access just like any other security principal. For more information, see [Access control lists on files and directories](../storage/blobs/data-lake-storage-access-control.md#access-control-lists-on-files-and-directories).
108111

109-
110112
## Next steps
111113

112-
For information on enabling studio in a network, see [Use Azure Machine Learning studio in an Azure Virtual Network](how-to-enable-studio-virtual-network.md).
114+
For information about enabling studio in a network, see [Use Azure Machine Learning studio in an Azure Virtual Network](how-to-enable-studio-virtual-network.md).

0 commit comments

Comments
 (0)