You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn how Azure Machine Learning integrates with a local Git repository to track repository, branch, and current commit information as part of a training job.
5
5
services: machine-learning
@@ -9,107 +9,101 @@ ms.topic: conceptual
9
9
author: ositanachi
10
10
ms.author: osiotugo
11
11
ms.reviewer: larryfr
12
-
ms.date: 06/02/2023
12
+
ms.date: 06/11/2024
13
13
ms.custom: sdkv2, build-2023
14
14
---
15
15
# Git integration for Azure Machine Learning
16
16
17
-
[Git](https://git-scm.com/) is a popular version control system that allows you to share and collaborate on your projects.
17
+
[Git](https://git-scm.com/) is a popular version control system that allows you to share and collaborate on your projects. This article explains how Azure Machine Learning can integrate with a local Git repository to track repository, branch, and current commit information as part of a training job.
18
18
19
-
Azure Machine Learning fully supports Git repositories for tracking work - you can clone repositories directly onto your shared workspace file system, use Git on your local workstation, or use Git from a CI/CD pipeline.
19
+
Azure Machine Learning fully supports Git repositories for tracking work. You can clone repositories directly onto your shared workspace file system, use Git on your local workstation, or use Git from a continuous integration and continuous deployment (CI/CD) pipeline.
20
20
21
-
When submitting a job to Azure Machine Learning, if source files are stored in a local git repository then information about the repo is tracked as part of the training process.
22
-
23
-
Since Azure Machine Learning tracks information from a local git repo, it isn't tied to any specific central repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other git-compatible service.
21
+
When you submit a job to Azure Machine Learning, if source files are stored in a local git repository, information about the repo is tracked as part of the training process. Because Azure Machine Learning tracks the information from the local git repo, it isn't tied to any specific central repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other Git-compatible service.
24
22
25
23
> [!TIP]
26
-
> Use Visual Studio Code to interact with Git through a graphical user interface. To connect to an Azure Machine Learning remote compute instance using Visual Studio Code, see [Launch Visual Studio Code integrated with Azure Machine Learning (preview)](how-to-launch-vs-code-remote.md)
27
-
>
28
-
> For more information on Visual Studio Code version control features, see [Using Version Control in VS Code](https://code.visualstudio.com/docs/editor/versioncontrol) and [Working with GitHub in VS Code](https://code.visualstudio.com/docs/editor/github).
24
+
> You can use Visual Studio Code to interact with Git through a graphical user interface. To connect to an Azure Machine Learning remote compute instance by using Visual Studio Code, see [Launch Visual Studio Code integrated with Azure Machine Learning (preview)](how-to-launch-vs-code-remote.md).
25
+
>
26
+
> For more information on Visual Studio Code version control features, see [Use Version Control in VS Code](https://code.visualstudio.com/docs/editor/versioncontrol) and [Work with GitHub in VS Code](https://code.visualstudio.com/docs/editor/github).
29
27
30
-
## Clone Git repositories into your workspace file system
31
-
Azure Machine Learning provides a shared file system for all users in the workspace.
32
-
To clone a Git repository into this file share, we recommend that you create a compute instance & [open a terminal](how-to-access-terminal.md).
33
-
Once the terminal is opened, you have access to a full Git client and can clone and work with Git via the Git CLI experience.
28
+
## Clone Git repositories in a workspace file system
34
29
35
-
We recommend that you clone the repository into your user directory so that others will not make collisions directly on your working branch.
30
+
Azure Machine Learning provides a shared file system for all users in a workspace. To clone a Git repository into this file share, you can create a compute instance and open a terminal. Once you open the terminal, you have access to a full Git client and can clone and work with Git via the Git CLI experience.
36
31
37
-
> [!TIP]
38
-
> There is a performance difference between cloning to the local file system of the compute instance or cloning to the mounted filesystem (mounted as the `~/cloudfiles/code` directory). In general, cloning to the local filesystem will have better performance than to the mounted filesystem. However, the local filesystem is lost if you delete and recreate the compute instance. The mounted filesystem is kept if you delete and recreate the compute instance.
32
+
You can clone any Git repository you can authenticate to, such as a GitHub, Azure Repos, or BitBucket repo. It's best to clone the repository into your user directory, so that other users don't collide directly on your working branch.
39
33
40
-
You can clone any Git repository you can authenticate to (GitHub, Azure Repos, BitBucket, etc.)
34
+
There's a performance difference between cloning to the local file system of the compute instance or cloning to the filesystem mounted as the *~/cloudfiles/code* directory. In general, cloning to the local filesystem provides better performance than cloning to the mounted filesystem. However, if you delete and recreate the compute instance, the local filesystem is lost, whereas the mounted filesystem is kept.
41
35
42
-
For more information about cloning, see the guide on [how to use Git CLI](https://guides.github.com/introduction/git-handbook/).
36
+
For more information about the Git CLI, see [Git CLI](https://git-scm.com/docs/gitcli).
43
37
44
-
## Authenticate your Git Account with SSH
45
-
### Generate a new SSH key
46
-
1)[Open the terminal window](./how-to-access-terminal.md) in the Azure Machine Learning Notebook Tab.
38
+
## Clone Git repositories with SSH
47
39
48
-
2) Paste the text below, substituting in your email address.
40
+
You can clone a repo by using HTTPS or SSH. The following sections describe how to clone a repo by using SSH. To use SSH, you need to authenticate your Git account with SSH by using an SSH key.
This creates a new ssh key, using the provided email as a label.
44
+
To generate a new SSH key:
55
45
56
-
```
57
-
> Generating public/private rsa key pair.
58
-
```
46
+
1. In the Azure Machine Learning studio **Notebook** page, [open a terminal window](./how-to-access-terminal.md) and run the following command, substituting your email address.
59
47
60
-
3) When you're prompted to "Enter a file in which to save the key" press Enter. This accepts the default file location.
4) Verify that the default location is '/home/azureuser/.ssh' and press enter. Otherwise specify the location '/home/azureuser/.ssh'.
52
+
The command returns the output `Generating public/private rsa key pair.` and generates a new SSH key with the provided email as a label.
63
53
64
-
> [!TIP]
65
-
> Make sure the SSH key is saved in '/home/azureuser/.ssh'. This file is saved on the compute instance is only accessible by the owner of the Compute Instance
54
+
1. At the following prompt, make sure the default location is `/home/azureuser/.ssh` or specify that location, and then press Enter.
66
55
67
-
```
68
-
> Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
69
-
```
56
+
```bash
57
+
Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
58
+
```
70
59
71
-
5) At the prompt, type a secure passphrase. We recommend you add a passphrase to your SSH key for added security
60
+
The key file saves on the compute instance, and is accessible only to the compute instance owner.
72
61
73
-
```
62
+
1. It's best to add a passphrase to your SSH key for added security. At the following prompt, enter a secure passphrase.
63
+
64
+
```bash
74
65
> Enter passphrase (empty for no passphrase): [Type a passphrase]
75
66
> Enter same passphrase again: [Type passphrase again]
76
67
```
77
68
78
-
### Add the public key to Git Account
79
-
1) In your terminal window, copy the contents of your public key file. If you renamed the key, replace id_rsa.pub with the public key file name.
69
+
### Add the public key to your Git account
70
+
71
+
1. In your terminal window, copy the contents of your public key file. If you renamed the key, replace `id_rsa.pub` with the public key file name.
80
72
81
73
```bash
82
74
cat ~/.ssh/id_rsa.pub
83
75
```
84
-
> [!TIP]
85
-
> **Copy and Paste in Terminal**
86
-
> * Windows: `Ctrl-Insert` to copy and use `Ctrl-Shift-v` or `Shift-Insert` to paste.
87
-
> * Mac OS: `Cmd-c` to copy and `Cmd-v` to paste.
88
-
> * FireFox and Internet Explorer may not support clipboard permissions properly.
89
76
90
-
2) Select and copy the SSH key output to your clipboard.
91
-
3) Next, follow the steps to add the SSH key to your preferred account type:
77
+
1. To add the SSH key to your Git account, refer to the following instructions depending on your Git service:
SSH might display the server's SSH fingerprint and ask you to verify it, as in the following example.
113
107
114
108
```bash
115
109
The authenticity of host 'example.com (192.30.255.112)' can't be established.
@@ -118,51 +112,54 @@ Are you sure you want to continue connecting (yes/no)? yes
118
112
Warning: Permanently added 'github.com,192.30.255.112' (RSA) to the list of known hosts.
119
113
```
120
114
121
-
SSH may display the server's SSH fingerprint and ask you to verify it. You should verify that the displayed fingerprint matches one of the fingerprints in the SSH public keys page.
115
+
SSH displays this fingerprint when it connects to an unknown host to protect you from [man-in-the-middle attacks](/previous-versions/windows/it-pro/windows-2000-server/cc959354(v=technet.10)). You should verify that the displayed fingerprint matches one of the fingerprints in the SSH public keys page.
122
116
123
-
SSH displays this fingerprint when it connects to an unknown host to protect you from [man-in-the-middle attacks](/previous-versions/windows/it-pro/windows-2000-server/cc959354(v=technet.10)). Once you accept the host's fingerprint, SSH will not prompt you again unless the fingerprint changes.
124
-
125
-
3) When you are asked if you want to continue connecting, type `yes`. Git will clone the repo and set up the origin remote to connect with SSH for future Git commands.
117
+
When you're asked if you want to continue connecting, enter *yes*. Once you accept the host's fingerprint, SSH doesn't prompt you again unless the fingerprint changes.
126
118
127
119
## Track code that comes from Git repositories
128
120
129
-
When you submit a training job from the Python SDK or Machine Learning CLI, the files needed to train the model are uploaded to your workspace. If the `git` command is available on your development environment, the upload process uses it to check if the files are stored in a git repository. If so, then information from your git repository is also uploaded as part of the training job. This information is stored in the following properties for the training job:
121
+
When you submit a training job from the Python SDK or Machine Learning CLI, the files needed to train the model are uploaded to your workspace. If the `git`command is available on your development environment, the upload process checks if the files are stored in a Git repository, and uploads information from the Git repository as part of the training job.
122
+
123
+
The following information is sent forjobs that use an estimator, machine learning pipeline, or script run. The information is storedin the following properties for the training job:
130
124
131
125
| Property | Git command used to get the value | Description |
132
126
| ----- | ----- | ----- |
133
127
|`azureml.git.repository_uri`|`git ls-remote --get-url`| The URI that your repository was cloned from. |
134
-
| `mlflow.source.git.repoURL` | `git ls-remote --get-url` | The URI that your repository was cloned from. |
135
128
|`azureml.git.branch`|`git symbolic-ref --short HEAD`| The active branch when the job was submitted. |
136
-
| `mlflow.source.git.branch` | `git symbolic-ref --short HEAD` | The active branch when the job was submitted. |
137
129
|`azureml.git.commit`|`git rev-parse HEAD`| The commit hash of the code that was submitted for the job. |
130
+
|`azureml.git.dirty`|`git status --porcelain .`|`True`if the branch or commit is dirty, otherwise `false`. |
131
+
|`mlflow.source.git.repoURL`|`git ls-remote --get-url`| The URI that your repository was cloned from. |
132
+
|`mlflow.source.git.branch`|`git symbolic-ref --short HEAD`| The active branch when the job was submitted. |
138
133
|`mlflow.source.git.commit`|`git rev-parse HEAD`| The commit hash of the code that was submitted for the job. |
139
-
| `azureml.git.dirty` | `git status --porcelain .` | `True`, if the branch/commit is dirty; otherwise, `false`. |
140
134
141
-
This information is sent for jobs that use an estimator, machine learning pipeline, or script run.
142
-
143
-
If your training files are not located in a git repository on your development environment, or the `git` command is not available, then no git-related information is tracked.
135
+
If your training files aren't located in a Git repository on your development environment, or the `git` command isn't available, no Git-related information is tracked.
144
136
145
137
> [!TIP]
146
-
> To check if the git command is available on your development environment, open a shell session, command prompt, PowerShell or other command line interface and type the following command:
138
+
> To check if the `git`command is available on your development environment, run the following commandin a command line interface:
147
139
>
148
140
>```
149
141
> git --version
150
142
>```
151
143
>
152
-
> If installed, and in the path, you receive a response similar to `git version 2.4.1`. For more information on installing git on your development environment, see the [Git website](https://git-scm.com/).
144
+
> If Git is installed and in your path, you receive a response similar to `git version 2.4.1`.
145
+
146
+
For more information on installing Git on your development environment, see the [Git website](https://git-scm.com/).
153
147
154
-
## View the logged information
148
+
## View Git information
155
149
156
-
The git information is stored in the properties for a training job. You can view this information using the Azure portal or Python SDK.
150
+
The Git information is stored in the properties for a training job. You can view this information by using the Azure portal, Python SDK, or Azure CLI.
157
151
158
152
### Azure portal
159
153
160
-
1. From the [studio portal](https://ml.azure.com), select your workspace.
161
-
1. Select __Jobs__, and then select one of your experiments.
162
-
1. Select one of the jobs from the __Display name__ column.
163
-
1. Select __Outputs + logs__, and then expand the __logs__ and __azureml__ entries. Select the link that begins with __###\_azure__.
154
+
In your Azure Machine Learning workspace in Azure Machine Learning studio:
164
155
165
-
The logged information contains text similar to the following JSON:
156
+
1. Select the **Jobs** page.
157
+
1. Select an experiment.
158
+
1. Select a job from the **Display name** column.
159
+
1. Select **Outputs + logs**, from the top menu, and then expand the **logs** and **azureml** entries.
160
+
1. Select the link that begins with **###_azure**.
161
+
162
+
The logged information contains text similar to the following JSON code:
166
163
167
164
```json
168
165
"properties": {
@@ -181,27 +178,25 @@ The logged information contains text similar to the following JSON:
181
178
}
182
179
```
183
180
184
-
### View properties
185
-
186
-
After submitting a training run, a [Job](/python/api/azure-ai-ml/azure.ai.ml.entities.job) object is returned. The `properties` attribute of this object contains the logged git information. For example, the following code retrieves the commit hash:
After you submit a training run, a [Job](/python/api/azure-ai-ml/azure.ai.ml.entities.job) object is returned. The `properties` attribute of this object contains the logged Git information. For example, the following code retrieves the commit hash:
0 commit comments