Update how-to-identity-based-service-authentication.md

J-Silvestre · web-flow · commit 25051b2694cc · 2025-02-04T14:39:22.000Z
Added section for kubernetes cluster
diff --git a/articles/machine-learning/how-to-identity-based-service-authentication.md b/articles/machine-learning/how-to-identity-based-service-authentication.md
@@ -413,6 +413,55 @@ The following steps outline how to set up data access with user identity for tra
 > [!IMPORTANT] 
 > During job submission with authentication with user identity enabled, the code snapshots are protected against tampering by checksum validation. If you have existing pipeline components and intend to use them with authentication with user identity enabled, you might need to re-upload them. Otherwise the job may fail during checksum validation. 
 
+### Access data for training jobs on AKS clusters using user identity
+When training on Azure Kubernetes Service (AKS) clusters, the authentication to dependent azure resources works differently.
+The following steps outline how to set up data access with a given managed identity for training jobs on AKS clusters:
+
+1. Firstly, create and attach the [Azure Kubernetes Cluster to your Azure Machine Learning Workspace](https://learn.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-to-workspace?view=azureml-api-2&tabs=sdk#how-to-attach-a-kubernetes-cluster-to-azure-machine-learning-workspace).
+
+1. Ensure that the kubernetes cluster has an [assigned managed identity](https://learn.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-to-workspace?view=azureml-api-2&tabs=sdk#assign-managed-identity) and that the identity has the necessary [azure roles assigned to it](https://learn.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-to-workspace?view=azureml-api-2&tabs=sdk#assign-azure-roles-to-managed-identity).
+
+1. When submitting the job, make sure to provide the managed identity of the compute **without specifying the client_id** in the parameters:
+
+    ```yaml
+    command: |
+    echo "--census-csv: ${{inputs.census_csv}}"
+    python hello-census.py --census-csv ${{inputs.census_csv}}
+    code: src
+    inputs:
+    census_csv:
+        type: uri_file 
+        path: azureml://datastores/mydata/paths/census.csv
+    environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
+    compute: azureml:kubernetes-cluster
+    ```
+
+    ```python
+    from azure.ai.ml import command
+    from azure.ai.ml.entities import Data, UriReference
+    from azure.ai.ml import Input
+    from azure.ai.ml.constants import AssetTypes
+    from azure.ai.ml import UserIdentityConfiguration
+    
+    # Specify the data location
+    my_job_inputs = {
+        "input_data": Input(type=AssetTypes.URI_FILE, path="<path-to-my-data>")
+    }
+
+    # Define the job
+    job = command(
+        code="<my-local-code-location>", 
+        command="python <my-script>.py --input_data ${{inputs.input_data}}",
+        inputs=my_job_inputs,
+        environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:9",
+        compute="<my-kubernetes-cluster-name>",
+        identity= ManagedIdentityConfiguration() 
+    )
+    # submit the command
+    returned_job = ml_client.jobs.create_or_update(job)
+    ```
+In this case, you can leave the identity property unspecified in the yaml, as it will default to the managed identity of the kubernetes cluster.
+
 ### Work with virtual networks
 
 By default, Azure Machine Learning can't communicate with a storage account that's behind a firewall or in a virtual network.