You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-identity-based-service-authentication.md
+59-55Lines changed: 59 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -175,15 +175,69 @@ Identity-based data access supports connections to **only** the following storag
175
175
176
176
To access these storage services, you must have at least [Storage Blob Data Reader](../role-based-access-control/built-in-roles.md#storage-blob-data-reader) access to the storage account. Only storage account owners can [change your access level via the Azure portal](../storage/blobs/assign-azure-role-data-access.md).
177
177
178
-
If you prefer to not use your user identity (Azure Active Directory), you can also grant a workspace managed-system identity (MSI) permission to create the datastore. To do so, you must have Owner permissions to the storage account and [specify the MSI credentials when creating the datastore](how-to-datastore.md?tabs=cli-identity-based-access%2Ccli-adls-sp%2Ccli-azfiles-account-key%2Ccli-adlsgen1-sp).
178
+
### Access data for training jobs on compute using managed identity
179
179
180
-
If you're training a model on a remote compute target and want to access the data for training, the compute identity must be granted at least the Storage Blob Data Reader role from the storage service. Learn how to [set up managed identity on a compute cluster](#compute-cluster).
180
+
Certain machine learning scenarios involve working with private data. In such cases, data scientists may not have direct access to data as Azure AD users. In this scenario, the managed identity of a compute can be used for data access authentication. In this scenario, the data can only be accessed from a compute instance or a machine learning compute cluster executing a training job. With this approach, the admin grants the compute instance or compute cluster managed identity Storage Blob Data Reader permissions on the storage. The individual data scientists don't need to be granted access.
181
181
182
-
### Working with private data
182
+
To enable authentication with compute managed identity:
183
183
184
-
Certain machine learning scenarios involve working with private data. In such cases, data scientists may not have direct access to data as Azure AD users. In this scenario, the managed identity of a compute can be used for data access authentication. In this scenario, the data can only be accessed from a compute instance or a machine learning compute cluster executing a training job.
184
+
* Create compute with managed identity enabled. See the [compute cluster](#compute-cluster) section, or for compute instance, the [Assign managed identity (preview)](how-to-create-manage-compute-instance.md) section.
185
+
* Grant compute managed identity at least Storage Blob Data Reader role on the storage account.
186
+
* Create any datastores with identity-based authentication enabled. See [Create datastores](how-to-datastore.md).
185
187
186
-
With this approach, the admin grants the compute instance or compute cluster managed identity Storage Blob Data Reader permissions on the storage. The individual data scientists don't need to be granted access. For more information on configuring the managed identity for the compute cluster, see the [compute cluster](#compute-cluster) section. For information on using configuring Azure RBAC for the storage, see [role-based access controls](../storage/blobs/assign-azure-role-data-access.md).
188
+
Once the identity-based authentication is enabled, the compute managed identity is used by default when accessing data within your training jobs. Optionally, you can authenticate with user identity using the steps described in next section.
189
+
190
+
For information on using configuring Azure RBAC for the storage, see [role-based access controls](../storage/blobs/assign-azure-role-data-access.md).
191
+
192
+
### Access data for training jobs on compute clusters using user identity (preview)
When training on [Azure Machine Learning compute clusters](how-to-create-attach-compute-cluster.md#what-is-a-compute-cluster), you can authenticate to storage with your user Azure Active Directory token.
197
+
198
+
This authentication mode allows you to:
199
+
* Set up fine-grained permissions, where different workspace users can have access to different storage accounts or folders within storage accounts.
200
+
* Let data scientists re-use existing permissions on storage systems.
201
+
* Audit storage access because the storage logs show which identities were used to access data.
202
+
203
+
> [!IMPORTANT]
204
+
> This functionality has the following limitations
205
+
> * Feature is only supported for experiments submitted via the [Azure Machine Learning CLI](how-to-configure-cli.md)
206
+
> * Only CommandJobs, and PipelineJobs with CommandSteps and AutoMLSteps are supported
207
+
> * User identity and compute managed identity cannot be used for authentication within same job.
208
+
209
+
> [!WARNING]
210
+
> This feature is __public preview__ and is __not secure for production workloads__. Ensure that only trusted users have permissions to access your workspace and storage accounts.
211
+
>
212
+
> Preview features are provided without a service-level agreement, and are not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
213
+
>
214
+
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
215
+
216
+
The following steps outline how to set up identity-based data access for training jobs on compute clusters.
217
+
218
+
1. Grant the user identity access to storage resources. For example, grant StorageBlobReader access to the specific storage account you want to use or grant ACL-based permission to specific folders or files in Azure Data Lake Gen 2 storage.
219
+
220
+
1. Create an Azure Machine Learning datastore without cached credentials for the storage account. If a datastore has cached credentials, such as storage account key, those credentials are used instead of user identity.
221
+
222
+
1. Submit a training job with property **identity** set to **type: user_identity**, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.
223
+
224
+
> [!NOTE]
225
+
> If the **identity** property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.
When training on [Azure Machine Learning compute clusters](how-to-create-attach-compute-cluster.md#what-is-a-compute-cluster), you can authenticate to storage with your user Azure Active Directory token.
411
-
412
-
This authentication mode allows you to:
413
-
* Set up fine-grained permissions, where different workspace users can have access to different storage accounts or folders within storage accounts.
414
-
* Let data scientists re-use existing permissions on storage systems.
415
-
* Audit storage access because the storage logs show which identities were used to access data.
416
-
417
-
> [!IMPORTANT]
418
-
> This functionality has the following limitations
419
-
> * Feature is only supported for experiments submitted via the [Azure Machine Learning CLI](how-to-configure-cli.md)
420
-
> * Only CommandJobs, and PipelineJobs with CommandSteps and AutoMLSteps are supported
421
-
> * User identity and compute managed identity cannot be used for authentication within same job.
422
-
423
-
> [!WARNING]
424
-
> This feature is __public preview__ and is __not secure for production workloads__. Ensure that only trusted users have permissions to access your workspace and storage accounts.
425
-
>
426
-
> Preview features are provided without a service-level agreement, and are not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
427
-
>
428
-
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
429
-
430
-
The following steps outline how to set up identity-based data access for training jobs on compute clusters.
431
-
432
-
1. Grant the user identity access to storage resources. For example, grant StorageBlobReader access to the specific storage account you want to use or grant ACL-based permission to specific folders or files in Azure Data Lake Gen 2 storage.
433
-
434
-
1. Create an Azure Machine Learning datastore without cached credentials for the storage account. If a datastore has cached credentials, such as storage account key, those credentials are used instead of user identity.
435
-
436
-
1. Submit a training job with property **identity** set to **type: user_identity**, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.
437
-
438
-
> [!NOTE]
439
-
> If the **identity** property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.
0 commit comments