Skip to content

Commit dffb986

Browse files
Merge pull request #5455 from Blackmist/370350-uuf
updates per customer feedback
2 parents 2d456de + b556f1f commit dffb986

File tree

1 file changed

+48
-41
lines changed

1 file changed

+48
-41
lines changed

articles/machine-learning/how-to-identity-based-service-authentication.md

Lines changed: 48 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.author: larryfr
88
ms.reviewer: meyetman
99
ms.service: azure-machine-learning
1010
ms.subservice: enterprise-readiness
11-
ms.date: 07/26/2024
11+
ms.date: 06/10/2025
1212
ms.topic: how-to
1313
ms.custom: has-adal-ref, subject-rbac-steps, cliv2, sdkv2, devx-track-azurecli
1414
---
@@ -20,7 +20,7 @@ ms.custom: has-adal-ref, subject-rbac-steps, cliv2, sdkv2, devx-track-azurecli
2020
Azure Machine Learning is composed of multiple Azure services. There are multiple ways that authentication can happen between Azure Machine Learning and the services it relies on.
2121

2222
* The Azure Machine Learning workspace uses a __managed identity__ to communicate with other services. By default, this is a system-assigned managed identity. You can also use a user-assigned managed identity instead.
23-
* Azure Machine Learning uses Azure Container Registry (ACR) to store Docker images used to train and deploy models. If you allow Azure Machine Learning to automatically create ACR, it will enable the __admin account__.
23+
* Azure Machine Learning uses Azure Container Registry (ACR) to store Docker images used to train and deploy models. If you allow Azure Machine Learning to automatically create ACR, it enables the __admin account__.
2424
* The Azure Machine Learning compute cluster uses a __managed identity__ to retrieve connection information for datastores from Azure Key Vault and to pull Docker images from ACR. You can also configure identity-based access to datastores, which will instead use the managed identity of the compute cluster.
2525
* Data access can happen along multiple paths depending on the data storage service and your configuration. For example, authentication to the datastore may use an account key, token, security principal, managed identity, or user identity.
2626
* Managed online endpoints can use a managed identity to access Azure resources when performing inference. For more information, see [Access Azure resources from an online endpoint](how-to-access-resources-from-endpoints-managed-identities.md).
@@ -201,7 +201,7 @@ During a run there are two applications of an identity:
201201

202202
1. The system uses an identity to set up the user's storage mounts, container registry, and datastores.
203203

204-
* In this case, the system will use the default-managed identity.
204+
* In this case, the system uses the default-managed identity.
205205

206206
1. You apply an identity to access resources from within the code for a submitted job:
207207

@@ -288,7 +288,7 @@ During a run, there are two applications of an identity:
288288

289289
- The system uses an identity to set up the user's storage mounts, container registry, and datastores.
290290

291-
* In this case, the system will use the default-managed identity.
291+
* In this case, the system uses the default-managed identity.
292292

293293
- You apply an identity to access resources from within the code for a submitted job:
294294

@@ -304,7 +304,7 @@ During a run, there are two applications of an identity:
304304
To configure a kubernetes cluster compute, make sure that it has the [necessary AML extension deployed in it](how-to-deploy-kubernetes-extension.md?view=azureml-api-2&preserve-view=true&tabs=deploy-extension-with-cli) and follow the documentation on [how to attach the kubernetes cluster compute to your AML workspace](how-to-attach-kubernetes-to-workspace.md?view=azureml-api-2&preserve-view=true&tabs=cli).
305305

306306
> [!IMPORTANT]
307-
> For Training purposes (Machine Learning Jobs), the identity that is used is the one assigned to the Kubernetes Cluster Compute. However, in the case of inferencing (Managed Online Endpoints), the identity that is used is the one assigned to the endpoint. For more information see [How to Access Azure Resources from an Online Endpoint](how-to-access-resources-from-endpoints-managed-identities.md?view=azureml-api-2&preserve-view=true&tabs=system-identity-cli).
307+
> For Training purposes (Machine Learning Jobs), the identity that is used is the one assigned to the Kubernetes Cluster Compute. However, in the case of inferencing (Managed Online Endpoints), the identity that is used is the one assigned to the endpoint. For more information, see [How to Access Azure Resources from an Online Endpoint](how-to-access-resources-from-endpoints-managed-identities.md?view=azureml-api-2&preserve-view=true&tabs=system-identity-cli).
308308

309309
---
310310

@@ -337,7 +337,7 @@ The same behavior applies when you work with data interactively via a Jupyter No
337337
To help ensure that you securely connect to your storage service on Azure, Azure Machine Learning requires that you have permission to access the corresponding data storage.
338338

339339
> [!WARNING]
340-
> Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, please reach out to the Azure Machine Learning Data Support team alias at [email protected] for assistance with a custom code solution.
340+
> Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, Contact the Azure Machine Learning Data Support team alias at [email protected] for assistance with a custom code solution.
341341

342342
Identity-based data access supports connections to **only** the following storage services.
343343

@@ -382,8 +382,8 @@ This authentication mode allows you to:
382382
> [!IMPORTANT]
383383
> This functionality has the following limitations
384384
> * Feature is supported for experiments submitted via the [Azure Machine Learning CLI and Python SDK V2](concept-v2.md), but not via ML Studio.
385-
> * User identity and compute managed identity cannot be used for authentication within same job.
386-
> * For pipeline jobs, we recommend setting user identity at the individual step level that will be executed on a compute, rather than at the root pipeline level. ( While identity setting is supported at both root pipeline and step levels, the step level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be executed. Identity set at the root pipeline or pipeline component level will not function. Therefore, we suggest setting identity at the individual step level for simplicity.)
385+
> * User identity and compute managed identity can't be used for authentication within same job.
386+
> * For pipeline jobs, we recommend setting user identity at the individual step level that will be executed on a compute, rather than at the root pipeline level. (While identity setting is supported at both root pipeline and step levels, the step level setting takes precedence if both are set. However, for pipelines containing pipeline components, identity must be set on individual steps that will be executed. Identity set at the root pipeline or pipeline component level won't function. Therefore, we suggest setting identity at the individual step level for simplicity.)
387387

388388
The following steps outline how to set up data access with user identity for training jobs on compute clusters from CLI.
389389

@@ -394,7 +394,7 @@ The following steps outline how to set up data access with user identity for tra
394394
1. Submit a training job with property **identity** set to **type: user_identity**, as shown in following job specification. During the training job, the authentication to storage happens via the identity of the user that submits the job.
395395

396396
> [!NOTE]
397-
> If the **identity** property is left unspecified and datastore does not have cached credentials, then compute managed identity becomes the fallback option.
397+
> If the **identity** property is left unspecified and datastore doesn't have cached credentials, then compute managed identity becomes the fallback option.
398398

399399
```yaml
400400
command: |
@@ -452,13 +452,13 @@ By default, Azure Machine Learning can't communicate with a storage account that
452452

453453
You can configure storage accounts to allow access only from within specific virtual networks. This configuration requires extra steps to ensure data isn't leaked outside of the network. This behavior is the same for credential-based data access. For more information, see [How to prevent data exfiltration](how-to-prevent-data-loss-exfiltration.md).
454454

455-
If your storage account has virtual network settings, that dictates what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.
455+
If your storage account has virtual network settings, those settings dictate what identity type and permissions access is needed. For example for data preview and data profile, the virtual network settings determine what type of identity is used to authenticate data access.
456456

457457
* In scenarios where only certain IPs and subnets are allowed to access the storage, then Azure Machine Learning uses the workspace MSI to accomplish data previews and profiles.
458458

459459
* If your storage is ADLS Gen 2 or Blob and has virtual network settings, customers can use either user identity or workspace MSI depending on the datastore settings defined during creation.
460460

461-
* If the virtual network setting is "Allow Azure services on the trusted services list to access this storage account", then Workspace MSI is used.
461+
* If the virtual network setting is "Allow Azure services on the trusted services list to access this storage account," then Workspace MSI is used.
462462

463463
## Scenario: Azure Container Registry without admin user
464464

@@ -581,50 +581,56 @@ Once you've configured ACR without admin user as described earlier, you can acce
581581

582582
By default, Azure Machine Learning uses Docker base images that come from a public repository managed by Microsoft. It then builds your training or inference environment on those images. For more information, see [What are ML environments?](concept-environments.md).
583583

584-
To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR. There are two use cases:
584+
To use a custom base image internal to your enterprise, you can use managed identities to access your private ACR.
585585

586-
* Use base image for training as is.
587-
* Build Azure Machine Learning managed image with custom image as a base.
586+
1. Create a machine learning compute cluster with system-assigned managed identity enabled as described earlier. Then, determine the principal ID of the managed identity.
588587

589-
### Pull Docker base image to machine learning compute cluster for training as is
590-
591-
Create machine learning compute cluster with system-assigned managed identity enabled as described earlier. Then, determine the principal ID of the managed identity.
588+
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
592589

593-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
590+
```azurecli-interactive
591+
az ml compute show --name <cluster name> -n <workspace> -g <resource group>
592+
```
594593

595-
```azurecli-interactive
596-
az ml compute show --name <cluster name> -n <workspace> -g <resource group>
597-
```
594+
Optionally, you can update the compute cluster to assign a user-assigned managed identity:
598595

599-
Optionally, you can update the compute cluster to assign a user-assigned managed identity:
596+
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
600597

601-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
598+
```azurecli-interactive
599+
az ml compute update --name <cluster name> --user-assigned-identities <my-identity-id>
600+
```
602601

603-
```azurecli-interactive
604-
az ml compute update --name <cluster name> --user-assigned-identities <my-identity-id>
605-
```
602+
2. To allow the compute cluster to pull the base images, grant the managed service identity (for the workspace or compute) ACRPull role on the private ACR
606603

607-
To allow the compute cluster to pull the base images, grant the managed service identity ACRPull role on the private ACR
604+
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
608605

609-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
606+
```azurecli-interactive
607+
az role assignment create --assignee <principal ID> \
608+
--role acrpull \
609+
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR resource group>/providers/Microsoft.ContainerRegistry/registries/<private ACR name>"
610+
```
610611

611-
```azurecli-interactive
612-
az role assignment create --assignee <principal ID> \
613-
--role acrpull \
614-
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR resource group>/providers/Microsoft.ContainerRegistry/registries/<private ACR name>"
615-
```
612+
3. Create an environment and specify the base image location in the [environment YAML file](reference-yaml-environment.md). The following YAML file demonstrates how to define an environment that references the private ACR. Replace the `<acr-url>` with the URL of your private ACR, such as `myregistry.azurecr.io`. Replace the `<image-path>` with the path to your image in the private ACR, such as `pytorch/pytorch:latest`:
616613

617-
Finally, create an environment and specify the base image location in the [environment YAML file](reference-yaml-environment.md).
614+
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
618615

619-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
616+
<!-- :::code language="yaml" source="~/azureml-examples-main/cli/assets/environment/docker-image.yml"::: -->
617+
618+
```yml
619+
$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
620+
name: docker-image-example
621+
image: <acr-url>/<image-path>:latest
622+
description: Environment created from a Docker image.
623+
```
624+
625+
4. The following command demonstrates how to create the environment from the YAML file. Replace `<yaml file>` with the path to your YAML file:
620626

621-
:::code language="yaml" source="~/azureml-examples-main/cli/assets/environment/docker-image.yml":::
627+
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
622628

623-
```azurecli
624-
az ml environment create --file <yaml file>
625-
```
629+
```azurecli
630+
az ml environment create --file <yaml file>
631+
```
626632

627-
You can now use the environment in a [training job](how-to-train-cli.md).
633+
You can now use the environment in a [training job](how-to-train-cli.md).
628634

629635
<!-- 20240725: this commented block will be restored at a later date TBD . . .
630636

@@ -674,7 +680,8 @@ In this scenario, Azure Machine Learning service builds the training or inferenc
674680
description: Environment created from private ACR.
675681
```
676682
-->
677-
## Next steps
683+
684+
## Related articles
678685

679686
* Learn more about [enterprise security in Azure Machine Learning](concept-enterprise-security.md)
680687
* Learn about [data administration](how-to-administrate-data-authentication.md)

0 commit comments

Comments
 (0)