You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you submit a training job that consumes a dataset created with identity-based data access, the managed identity of the training compute is used for data access authentication. Your Azure Active Directory token isn't used. For this scenario, ensure that the managed identity of the compute is granted at least the Storage Blob Data Reader role from the storage service. For more information, see [Set up managed identity on compute clusters](how-to-create-attach-compute-cluster.md#managed-identity).
203
203
204
+
## Access data for training jobs on compute clusters (preview)
When training on [Azure Machine Learning compute clusters](how-to-create-attach-compute-cluster.md#what-is-a-compute-cluster), you can authenticate to storage with your Azure Active Directory token.
211
+
212
+
This authentication mode allows you to:
213
+
* Set up fine-grained permissions, where different workspace users can have access to different storage accounts or folders within storage accounts.
214
+
* Audit storage access because the storage logs show which identities were used to access data.
215
+
216
+
> [!WARNING]
217
+
> This functionality has the following limitations
218
+
> * Feature is only supported for experiments submitted via the [Azure Machine Learning CLI v2 (preview)](how-to-configure-cli.md)
219
+
> * Only CommandJobs, and PipelineJobs with CommandSteps and AutoMLSteps are supported
220
+
> * User identity and compute managed identity cannot be used for authentication within same job.
221
+
222
+
The following steps outline how to set up identity-based data access for training jobs on compute clusters.
223
+
224
+
1. Grant the user identity access to storage resources. For example, grant StorageBlobReader access to the specific storage account you want to use or grant ACL-based permission to specific folders or files in Azure Data Lake Gen 2 storage.
225
+
226
+
1. Create an Azure Machine Learning datastore without cached credentials for the storage account. If a datastore has cached credentials, such as storage account key, those credentials are used instead of user identity.
227
+
228
+
1. Submit a training job with property **identity** set to **type: user_identity**, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.
229
+
230
+
> [!NOTE]
231
+
> If the **identity** property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.
0 commit comments