Skip to content

Commit 7ea9af4

Browse files
committed
draft
1 parent cb71d93 commit 7ea9af4

File tree

1 file changed

+87
-92
lines changed

1 file changed

+87
-92
lines changed
Lines changed: 87 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Git integration for Azure Machine Learning
2+
title: Git integration
33
titleSuffix: Azure Machine Learning
44
description: Learn how Azure Machine Learning integrates with a local Git repository to track repository, branch, and current commit information as part of a training job.
55
services: machine-learning
@@ -9,107 +9,101 @@ ms.topic: conceptual
99
author: ositanachi
1010
ms.author: osiotugo
1111
ms.reviewer: larryfr
12-
ms.date: 06/02/2023
12+
ms.date: 06/11/2024
1313
ms.custom: sdkv2, build-2023
1414
---
1515
# Git integration for Azure Machine Learning
1616

17-
[Git](https://git-scm.com/) is a popular version control system that allows you to share and collaborate on your projects.
17+
[Git](https://git-scm.com/) is a popular version control system that allows you to share and collaborate on your projects. This article explains how Azure Machine Learning can integrate with a local Git repository to track repository, branch, and current commit information as part of a training job.
1818

19-
Azure Machine Learning fully supports Git repositories for tracking work - you can clone repositories directly onto your shared workspace file system, use Git on your local workstation, or use Git from a CI/CD pipeline.
19+
Azure Machine Learning fully supports Git repositories for tracking work. You can clone repositories directly onto your shared workspace file system, use Git on your local workstation, or use Git from a continuous integration and continuous deployment (CI/CD) pipeline.
2020

21-
When submitting a job to Azure Machine Learning, if source files are stored in a local git repository then information about the repo is tracked as part of the training process.
22-
23-
Since Azure Machine Learning tracks information from a local git repo, it isn't tied to any specific central repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other git-compatible service.
21+
When you submit a job to Azure Machine Learning, if source files are stored in a local git repository, information about the repo is tracked as part of the training process. Because Azure Machine Learning tracks the information from the local git repo, it isn't tied to any specific central repository. Your repository can be cloned from GitHub, GitLab, Bitbucket, Azure DevOps, or any other Git-compatible service.
2422

2523
> [!TIP]
26-
> Use Visual Studio Code to interact with Git through a graphical user interface. To connect to an Azure Machine Learning remote compute instance using Visual Studio Code, see [Launch Visual Studio Code integrated with Azure Machine Learning (preview)](how-to-launch-vs-code-remote.md)
27-
>
28-
> For more information on Visual Studio Code version control features, see [Using Version Control in VS Code](https://code.visualstudio.com/docs/editor/versioncontrol) and [Working with GitHub in VS Code](https://code.visualstudio.com/docs/editor/github).
24+
> You can use Visual Studio Code to interact with Git through a graphical user interface. To connect to an Azure Machine Learning remote compute instance by using Visual Studio Code, see [Launch Visual Studio Code integrated with Azure Machine Learning (preview)](how-to-launch-vs-code-remote.md).
25+
>
26+
> For more information on Visual Studio Code version control features, see [Use Version Control in VS Code](https://code.visualstudio.com/docs/editor/versioncontrol) and [Work with GitHub in VS Code](https://code.visualstudio.com/docs/editor/github).
2927
30-
## Clone Git repositories into your workspace file system
31-
Azure Machine Learning provides a shared file system for all users in the workspace.
32-
To clone a Git repository into this file share, we recommend that you create a compute instance & [open a terminal](how-to-access-terminal.md).
33-
Once the terminal is opened, you have access to a full Git client and can clone and work with Git via the Git CLI experience.
28+
## Clone Git repositories in a workspace file system
3429

35-
We recommend that you clone the repository into your user directory so that others will not make collisions directly on your working branch.
30+
Azure Machine Learning provides a shared file system for all users in a workspace. To clone a Git repository into this file share, you can create a compute instance and open a terminal. Once you open the terminal, you have access to a full Git client and can clone and work with Git via the Git CLI experience.
3631

37-
> [!TIP]
38-
> There is a performance difference between cloning to the local file system of the compute instance or cloning to the mounted filesystem (mounted as the `~/cloudfiles/code` directory). In general, cloning to the local filesystem will have better performance than to the mounted filesystem. However, the local filesystem is lost if you delete and recreate the compute instance. The mounted filesystem is kept if you delete and recreate the compute instance.
32+
You can clone any Git repository you can authenticate to, such as a GitHub, Azure Repos, or BitBucket repo. It's best to clone the repository into your user directory, so that other users don't collide directly on your working branch.
3933

40-
You can clone any Git repository you can authenticate to (GitHub, Azure Repos, BitBucket, etc.)
34+
There's a performance difference between cloning to the local file system of the compute instance or cloning to the filesystem mounted as the *~/cloudfiles/code* directory. In general, cloning to the local filesystem provides better performance than cloning to the mounted filesystem. However, if you delete and recreate the compute instance, the local filesystem is lost, whereas the mounted filesystem is kept.
4135

42-
For more information about cloning, see the guide on [how to use Git CLI](https://guides.github.com/introduction/git-handbook/).
36+
For more information about the Git CLI, see [Git CLI](https://git-scm.com/docs/gitcli).
4337

44-
## Authenticate your Git Account with SSH
45-
### Generate a new SSH key
46-
1) [Open the terminal window](./how-to-access-terminal.md) in the Azure Machine Learning Notebook Tab.
38+
## Clone Git repositories with SSH
4739

48-
2) Paste the text below, substituting in your email address.
40+
You can clone a repo by using HTTPS or SSH. The following sections describe how to clone a repo by using SSH. To use SSH, you need to authenticate your Git account with SSH by using an SSH key.
4941

50-
```bash
51-
ssh-keygen -t rsa -b 4096 -C "[email protected]"
52-
```
42+
### Generate and save a new SSH key
5343

54-
This creates a new ssh key, using the provided email as a label.
44+
To generate a new SSH key:
5545

56-
```
57-
> Generating public/private rsa key pair.
58-
```
46+
1. In the Azure Machine Learning studio **Notebook** page, [open a terminal window](./how-to-access-terminal.md) and run the following command, substituting your email address.
5947

60-
3) When you're prompted to "Enter a file in which to save the key" press Enter. This accepts the default file location.
48+
```bash
49+
ssh-keygen -t rsa -b 4096 -C "[email protected]"
50+
```
6151

62-
4) Verify that the default location is '/home/azureuser/.ssh' and press enter. Otherwise specify the location '/home/azureuser/.ssh'.
52+
The command returns the output `Generating public/private rsa key pair.` and generates a new SSH key with the provided email as a label.
6353

64-
> [!TIP]
65-
> Make sure the SSH key is saved in '/home/azureuser/.ssh'. This file is saved on the compute instance is only accessible by the owner of the Compute Instance
54+
1. At the following prompt, make sure the default location is `/home/azureuser/.ssh` or specify that location, and then press Enter.
6655

67-
```
68-
> Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
69-
```
56+
```bash
57+
Enter a file in which to save the key (/home/azureuser/.ssh/id_rsa): [Press enter]
58+
```
7059

71-
5) At the prompt, type a secure passphrase. We recommend you add a passphrase to your SSH key for added security
60+
The key file saves on the compute instance, and is accessible only to the compute instance owner.
7261

73-
```
62+
1. It's best to add a passphrase to your SSH key for added security. At the following prompt, enter a secure passphrase.
63+
64+
```bash
7465
> Enter passphrase (empty for no passphrase): [Type a passphrase]
7566
> Enter same passphrase again: [Type passphrase again]
7667
```
7768

78-
### Add the public key to Git Account
79-
1) In your terminal window, copy the contents of your public key file. If you renamed the key, replace id_rsa.pub with the public key file name.
69+
### Add the public key to your Git account
70+
71+
1. In your terminal window, copy the contents of your public key file. If you renamed the key, replace `id_rsa.pub` with the public key file name.
8072

8173
```bash
8274
cat ~/.ssh/id_rsa.pub
8375
```
84-
> [!TIP]
85-
> **Copy and Paste in Terminal**
86-
> * Windows: `Ctrl-Insert` to copy and use `Ctrl-Shift-v` or `Shift-Insert` to paste.
87-
> * Mac OS: `Cmd-c` to copy and `Cmd-v` to paste.
88-
> * FireFox and Internet Explorer may not support clipboard permissions properly.
8976

90-
2) Select and copy the SSH key output to your clipboard.
91-
3) Next, follow the steps to add the SSH key to your preferred account type:
77+
1. To add the SSH key to your Git account, refer to the following instructions depending on your Git service:
9278

93-
+ [GitHub](https://docs.github.com/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account)
79+
- [GitHub](https://docs.github.com/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account)
80+
- [GitLab](https://docs.gitlab.com/ee/user/ssh.html#add-an-ssh-key-to-your-gitlab-account)
81+
- [Azure DevOps](/azure/devops/repos/git/use-ssh-keys-to-authenticate#step-2--add-the-public-key-to-azure-devops-servicestfs) Start at **Step 2**.
82+
- [BitBucket](https://support.atlassian.com/bitbucket-cloud/docs/set-up-an-ssh-key/#SetupanSSHkey-ssh2). Follow **Step 4**.
9483

95-
+ [GitLab](https://docs.gitlab.com/ee/user/ssh.html#add-an-ssh-key-to-your-gitlab-account)
84+
> [!TIP]
85+
> To copy and paste in the terminal window, use these keyboard shortcuts depending on your operating system:
86+
>
87+
> - Windows: Ctrl+Insert to copy, Ctrl+Shift+V or Ctrl+Shift+Insert to paste.
88+
> - MacOS: Cmd+C to copy and Cmd+V to paste.
89+
>
90+
> Some browsers might not support clipboard permissions properly.
9691
97-
+ [Azure DevOps](/azure/devops/repos/git/use-ssh-keys-to-authenticate#step-2--add-the-public-key-to-azure-devops-servicestfs) Start at **Step 2**.
92+
### Clone the Git repository with SSH
9893

99-
+ [BitBucket](https://support.atlassian.com/bitbucket-cloud/docs/set-up-an-ssh-key/#SetupanSSHkey-ssh2). Follow **Step 4**.
94+
1. Copy the SSH Git clone URL from the Git repo.
10095

101-
### Clone the Git repository with SSH
96+
1. Run the following `git clone` command, using your SSH Git repo URL. For example:
10297

103-
1) Copy the SSH Git clone URL from the Git repo.
98+
```bash
99+
git clone [email protected]:GitUser/azureml-example.git
100+
```
104101

105-
2) Paste the url into the `git clone` command below, to use your SSH Git repo URL. This will look something like:
102+
Git clones the repo and sets up the origin remote to connect with SSH for future Git commands.
106103

107-
```bash
108-
git clone [email protected]:GitUser/azureml-example.git
109-
Cloning into 'azureml-example'...
110-
```
104+
#### Verify fingerprint
111105

112-
You will see a response like:
106+
SSH might display the server's SSH fingerprint and ask you to verify it, as in the following example.
113107

114108
```bash
115109
The authenticity of host 'example.com (192.30.255.112)' can't be established.
@@ -118,51 +112,54 @@ Are you sure you want to continue connecting (yes/no)? yes
118112
Warning: Permanently added 'github.com,192.30.255.112' (RSA) to the list of known hosts.
119113
```
120114
121-
SSH may display the server's SSH fingerprint and ask you to verify it. You should verify that the displayed fingerprint matches one of the fingerprints in the SSH public keys page.
115+
SSH displays this fingerprint when it connects to an unknown host to protect you from [man-in-the-middle attacks](/previous-versions/windows/it-pro/windows-2000-server/cc959354(v=technet.10)). You should verify that the displayed fingerprint matches one of the fingerprints in the SSH public keys page.
122116
123-
SSH displays this fingerprint when it connects to an unknown host to protect you from [man-in-the-middle attacks](/previous-versions/windows/it-pro/windows-2000-server/cc959354(v=technet.10)). Once you accept the host's fingerprint, SSH will not prompt you again unless the fingerprint changes.
124-
125-
3) When you are asked if you want to continue connecting, type `yes`. Git will clone the repo and set up the origin remote to connect with SSH for future Git commands.
117+
When you're asked if you want to continue connecting, enter *yes*. Once you accept the host's fingerprint, SSH doesn't prompt you again unless the fingerprint changes.
126118

127119
## Track code that comes from Git repositories
128120

129-
When you submit a training job from the Python SDK or Machine Learning CLI, the files needed to train the model are uploaded to your workspace. If the `git` command is available on your development environment, the upload process uses it to check if the files are stored in a git repository. If so, then information from your git repository is also uploaded as part of the training job. This information is stored in the following properties for the training job:
121+
When you submit a training job from the Python SDK or Machine Learning CLI, the files needed to train the model are uploaded to your workspace. If the `git` command is available on your development environment, the upload process checks if the files are stored in a Git repository, and uploads information from the Git repository as part of the training job.
122+
123+
The following information is sent for jobs that use an estimator, machine learning pipeline, or script run. The information is stored in the following properties for the training job:
130124

131125
| Property | Git command used to get the value | Description |
132126
| ----- | ----- | ----- |
133127
| `azureml.git.repository_uri` | `git ls-remote --get-url` | The URI that your repository was cloned from. |
134-
| `mlflow.source.git.repoURL` | `git ls-remote --get-url` | The URI that your repository was cloned from. |
135128
| `azureml.git.branch` | `git symbolic-ref --short HEAD` | The active branch when the job was submitted. |
136-
| `mlflow.source.git.branch` | `git symbolic-ref --short HEAD` | The active branch when the job was submitted. |
137129
| `azureml.git.commit` | `git rev-parse HEAD` | The commit hash of the code that was submitted for the job. |
130+
| `azureml.git.dirty` | `git status --porcelain .` | `True` if the branch or commit is dirty, otherwise `false`. |
131+
| `mlflow.source.git.repoURL` | `git ls-remote --get-url` | The URI that your repository was cloned from. |
132+
| `mlflow.source.git.branch` | `git symbolic-ref --short HEAD` | The active branch when the job was submitted. |
138133
| `mlflow.source.git.commit` | `git rev-parse HEAD` | The commit hash of the code that was submitted for the job. |
139-
| `azureml.git.dirty` | `git status --porcelain .` | `True`, if the branch/commit is dirty; otherwise, `false`. |
140134

141-
This information is sent for jobs that use an estimator, machine learning pipeline, or script run.
142-
143-
If your training files are not located in a git repository on your development environment, or the `git` command is not available, then no git-related information is tracked.
135+
If your training files aren't located in a Git repository on your development environment, or the `git` command isn't available, no Git-related information is tracked.
144136

145137
> [!TIP]
146-
> To check if the git command is available on your development environment, open a shell session, command prompt, PowerShell or other command line interface and type the following command:
138+
> To check if the `git` command is available on your development environment, run the following command in a command line interface:
147139
>
148140
> ```
149141
> git --version
150142
> ```
151143
>
152-
> If installed, and in the path, you receive a response similar to `git version 2.4.1`. For more information on installing git on your development environment, see the [Git website](https://git-scm.com/).
144+
> If Git is installed and in your path, you receive a response similar to `git version 2.4.1`.
145+
146+
For more information on installing Git on your development environment, see the [Git website](https://git-scm.com/).
153147

154-
## View the logged information
148+
## View Git information
155149

156-
The git information is stored in the properties for a training job. You can view this information using the Azure portal or Python SDK.
150+
The Git information is stored in the properties for a training job. You can view this information by using the Azure portal, Python SDK, or Azure CLI.
157151

158152
### Azure portal
159153

160-
1. From the [studio portal](https://ml.azure.com), select your workspace.
161-
1. Select __Jobs__, and then select one of your experiments.
162-
1. Select one of the jobs from the __Display name__ column.
163-
1. Select __Outputs + logs__, and then expand the __logs__ and __azureml__ entries. Select the link that begins with __###\_azure__.
154+
In your Azure Machine Learning workspace in Azure Machine Learning studio:
164155

165-
The logged information contains text similar to the following JSON:
156+
1. Select the **Jobs** page.
157+
1. Select an experiment.
158+
1. Select a job from the **Display name** column.
159+
1. Select **Outputs + logs**, from the top menu, and then expand the **logs** and **azureml** entries.
160+
1. Select the link that begins with **###_azure**.
161+
162+
The logged information contains text similar to the following JSON code:
166163

167164
```json
168165
"properties": {
@@ -181,27 +178,25 @@ The logged information contains text similar to the following JSON:
181178
}
182179
```
183180

184-
### View properties
185-
186-
After submitting a training run, a [Job](/python/api/azure-ai-ml/azure.ai.ml.entities.job) object is returned. The `properties` attribute of this object contains the logged git information. For example, the following code retrieves the commit hash:
181+
### Python SDK V2
187182

188-
# [Python SDK](#tab/python)
189-
190-
[!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)]
183+
After you submit a training run, a [Job](/python/api/azure-ai-ml/azure.ai.ml.entities.job) object is returned. The `properties` attribute of this object contains the logged Git information. For example, the following code retrieves the commit hash:
191184

192185
```python
193186
job.properties["azureml.git.commit"]
194187
```
195188

196-
# [Azure CLI](#tab/cli)
197-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
189+
### Azure CLI V2
190+
191+
Run the `az ml job show` command to display the `GitCommit:properties`. For example:
198192

199193
```azurecli
200194
az ml job show --name my_job_id --query "{GitCommit:properties."""azureml.git.commit"""}"
201195
```
202196

203-
---
204-
205-
## Next steps
197+
## Related content
206198

207-
* [Access a compute instance terminal in your workspace](how-to-access-terminal.md)
199+
- [Access a compute instance terminal in your workspace](how-to-access-terminal.md)
200+
- [Launch Visual Studio Code integrated with Azure Machine Learning (preview)](how-to-launch-vs-code-remote.md)
201+
- [Use Version Control in VS Code](https://code.visualstudio.com/docs/editor/versioncontrol)
202+
- [Work with GitHub in VS Code](https://code.visualstudio.com/docs/editor/github)

0 commit comments

Comments
 (0)