You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure Machine Learning is composed of multiple Azure services. There are multiple ways that authentication can happen between Azure Machine Learning and the services it relies on.
21
21
22
22
* The Azure Machine Learning workspace uses a __managed identity__ to communicate with other services. By default, this is a system-assigned managed identity. You can also use a user-assigned managed identity instead.
23
-
* Azure Machine Learning uses Azure Container Registry (ACR) to store Docker images used to train and deploy models. If you allow Azure Machine Learning to automatically create ACR, it will enable the __admin account__.
23
+
* Azure Machine Learning uses Azure Container Registry (ACR) to store Docker images used to train and deploy models. If you allow Azure Machine Learning to automatically create ACR, it enables the __admin account__.
24
24
* The Azure Machine Learning compute cluster uses a __managed identity__ to retrieve connection information for datastores from Azure Key Vault and to pull Docker images from ACR. You can also configure identity-based access to datastores, which will instead use the managed identity of the compute cluster.
25
25
* Data access can happen along multiple paths depending on the data storage service and your configuration. For example, authentication to the datastore may use an account key, token, security principal, managed identity, or user identity.
26
26
* Managed online endpoints can use a managed identity to access Azure resources when performing inference. For more information, see [Access Azure resources from an online endpoint](how-to-access-resources-from-endpoints-managed-identities.md).
@@ -201,7 +201,7 @@ During a run there are two applications of an identity:
201
201
202
202
1. The system uses an identity to set up the user's storage mounts, container registry, and datastores.
203
203
204
-
* In this case, the system will use the default-managed identity.
204
+
* In this case, the system uses the default-managed identity.
205
205
206
206
1. You apply an identity to access resources from within the code for a submitted job:
207
207
@@ -288,7 +288,7 @@ During a run, there are two applications of an identity:
288
288
289
289
- The system uses an identity to set up the user's storage mounts, container registry, and datastores.
290
290
291
-
* In this case, the system will use the default-managed identity.
291
+
* In this case, the system uses the default-managed identity.
292
292
293
293
- You apply an identity to access resources from within the code for a submitted job:
294
294
@@ -304,7 +304,7 @@ During a run, there are two applications of an identity:
304
304
To configure a kubernetes cluster compute, make sure that it has the [necessary AML extension deployed in it](how-to-deploy-kubernetes-extension.md?view=azureml-api-2&preserve-view=true&tabs=deploy-extension-with-cli) and follow the documentation on [how to attach the kubernetes cluster compute to your AML workspace](how-to-attach-kubernetes-to-workspace.md?view=azureml-api-2&preserve-view=true&tabs=cli).
305
305
306
306
> [!IMPORTANT]
307
-
> For Training purposes (Machine Learning Jobs), the identity that is used is the one assigned to the Kubernetes Cluster Compute. However, in the case of inferencing (Managed Online Endpoints), the identity that is used is the one assigned to the endpoint. For more information see [How to Access Azure Resources from an Online Endpoint](how-to-access-resources-from-endpoints-managed-identities.md?view=azureml-api-2&preserve-view=true&tabs=system-identity-cli).
307
+
> For Training purposes (Machine Learning Jobs), the identity that is used is the one assigned to the Kubernetes Cluster Compute. However, in the case of inferencing (Managed Online Endpoints), the identity that is used is the one assigned to the endpoint. For more information, see [How to Access Azure Resources from an Online Endpoint](how-to-access-resources-from-endpoints-managed-identities.md?view=azureml-api-2&preserve-view=true&tabs=system-identity-cli).
308
308
309
309
---
310
310
@@ -337,7 +337,7 @@ The same behavior applies when you work with data interactively via a Jupyter No
337
337
To help ensure that you securely connect to your storage service on Azure, Azure Machine Learning requires that you have permission to access the corresponding data storage.
338
338
339
339
> [!WARNING]
340
-
> Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, please reach out to the Azure Machine Learning Data Support team alias at[email protected] for assistance with a custom code solution.
340
+
> Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, Contact the Azure Machine Learning Data Support team alias at [email protected] for assistance with a custom code solution.
341
341
342
342
Identity-based data access supports connections to **only** the following storage services.
343
343
@@ -382,8 +382,8 @@ This authentication mode allows you to:
382
382
> [!IMPORTANT]
383
383
> This functionality has the following limitations
384
384
> * Feature is supported for experiments submitted via the [Azure Machine Learning CLI and Python SDK V2](concept-v2.md), but not via ML Studio.
385
-
> * User identity and compute managed identity cannot be used for authentication within same job.
386
-
> * For pipeline jobs, we recommend setting user identity at the individual step level that will be executed on a compute, rather than at the root pipeline level. (While identity setting is supported at both root pipeline and step levels, the step level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be executed. Identity set at the root pipeline or pipeline component level will not function. Therefore, we suggest setting identity at the individual step level for simplicity.)
385
+
> * User identity and compute managed identity can't be used for authentication within same job.
386
+
> * For pipeline jobs, we recommend setting user identity at the individual step level that will be executed on a compute, rather than at the root pipeline level. (While identity setting is supported at both root pipeline and step levels, the step level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be executed. Identity set at the root pipeline or pipeline component level won't function. Therefore, we suggest setting identity at the individual step level for simplicity.)
387
387
388
388
The following steps outline how to set up data access with user identity for training jobs on compute clusters from CLI.
389
389
@@ -394,7 +394,7 @@ The following steps outline how to set up data access with user identity for tra
394
394
1. Submit a training job with property **identity** set to **type: user_identity**, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.
395
395
396
396
> [!NOTE]
397
-
> If the **identity** property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.
397
+
> If the **identity** property is left unspecified and datastore doesn't have cached credentials, then compute managed identity becomes the fallback option.
398
398
399
399
```yaml
400
400
command: |
@@ -452,13 +452,13 @@ By default, Azure Machine Learning can't communicate with a storage account that
452
452
453
453
You can configure storage accounts to allow access only from within specific virtual networks. This configuration requires extra steps to ensure data isn't leaked outside of the network. This behavior is the same for credential-based data access. For more information, see [How to prevent data exfiltration](how-to-prevent-data-loss-exfiltration.md).
454
454
455
-
If your storage account has virtual network settings, that dictates what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.
455
+
If your storage account has virtual network settings, those settings dictate what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.
456
456
457
457
* In scenarios where only certain IPs and subnets are allowed to access the storage, then Azure Machine Learning uses the workspace MSI to accomplish data previews and profiles.
458
458
459
459
* If your storage is ADLS Gen 2 or Blob and has virtual network settings, customers can use either user identity or workspace MSI depending on the datastore settings defined during creation.
460
460
461
-
* If the virtual network setting is "Allow Azure services on the trusted services list to access this storage account", then Workspace MSI is used.
461
+
* If the virtual network setting is "Allow Azure services on the trusted services list to access this storage account," then Workspace MSI is used.
462
462
463
463
## Scenario: Azure Container Registry without admin user
464
464
@@ -581,50 +581,56 @@ Once you've configured ACR without admin user as described earlier, you can acce
581
581
582
582
By default, Azure Machine Learning uses Docker base images that come from a public repository managed by Microsoft. It then builds your training or inference environment on those images. For more information, see [What are ML environments?](concept-environments.md).
583
583
584
-
To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR. There are two use cases:
584
+
To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR.
585
585
586
-
* Use base image for training as is.
587
-
* Build Azure Machine Learning managed image with custom image as a base.
586
+
1. Create a machine learning compute cluster with system-assigned managed identity enabled as described earlier. Then, determine the principal ID of the managed identity.
588
587
589
-
### Pull Docker base image to machine learning compute cluster for training as is
590
-
591
-
Create machine learning compute cluster with system-assigned managed identity enabled as described earlier. Then, determine the principal ID of the managed identity.
az ml compute update --name <cluster name> --user-assigned-identities <my-identity-id>
600
+
```
602
601
603
-
```azurecli-interactive
604
-
az ml compute update --name <cluster name> --user-assigned-identities <my-identity-id>
605
-
```
602
+
2. To allow the compute cluster to pull the base images, grant the managed service identity (for the workspace or compute) ACRPull role on the private ACR
606
603
607
-
To allow the compute cluster to pull the base images, grant the managed service identity ACRPull role on the private ACR
3. Create an environment and specify the base image location in the [environment YAML file](reference-yaml-environment.md). The following YAML file demonstrates how to define an environment that references the private ACR. Replace the `<acr-url>` with the URL of your private ACR, such as `myregistry.azurecr.io`. Replace the `<image-path>` with the path to your image in the private ACR, such as `pytorch/pytorch:latest`:
616
613
617
-
Finally, create an environment and specify the base image location in the [environment YAML file](reference-yaml-environment.md).
0 commit comments