Skip to content

Commit 9e0b44f

Browse files
authored
Merge pull request #113391 from djpmsft/docUpdates
Updating Source Control doc
2 parents 4c3ae28 + 4a9c4aa commit 9e0b44f

File tree

1 file changed

+40
-48
lines changed

1 file changed

+40
-48
lines changed

articles/data-factory/source-control.md

Lines changed: 40 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -10,30 +10,35 @@ manager: anandsub
1010
ms.reviewer:
1111
ms.topic: conceptual
1212
ms.custom: seo-lt-2019
13-
ms.date: 01/09/2019
13+
ms.date: 04/30/2020
1414
---
1515

1616
# Source control in Azure Data Factory
1717
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
1818

19-
The Azure Data Factory user interface experience (UX) has two experiences available for visual authoring:
19+
By default, the Azure Data Factory user interface experience (UX) authors directly against the data factory service. This experience has the following limitations:
2020

21-
- Author directly with the Data Factory service
22-
- Author with Azure Repos Git or GitHub integration
21+
- The Data Factory service doesn't include a repository for storing the JSON entities for your changes. The only way to save changes is via the **Publish All** button and all changes are published directly to the data factory service.
22+
- The Data Factory service isn't optimized for collaboration and version control.
2323

24-
> [!NOTE]
25-
> Only authoring directly with the Data Factory service is supported in the Azure Government Cloud.
26-
27-
## Author directly with the Data Factory service
24+
To provide a better authoring experience, Azure Data Factory allows you to configure a Git repository with either Azure Repos or GitHub. Git is a version control system that allows for easier change tracking and collaboration. This tutorial will outline how to configure and work in a git repository along with highlighting best practices and a troubleshooting guide.
2825

29-
While authoring directly with the Data Factory service, the only way to save changes is via the **Publish All** button. Once clicked, all changes that you made are published directly to the Data Factory service.
26+
> [!NOTE]
27+
> Azure data factory git integration is not available in the Azure Government Cloud.
3028
31-
![Publish mode](media/author-visually/data-factory-publish.png)
29+
## Advantages of Git integration
3230

33-
Authoring directly with the Data Factory service has the following limitations:
31+
Below is a list of some of the advantages git integration provides to the authoring experience:
3432

35-
- The Data Factory service doesn't include a repository for storing the JSON entities for your changes.
36-
- The Data Factory service isn't optimized for collaboration or version control.
33+
- **Source control:** As your data factory workloads become crucial, you would want to integrate your factory with Git to leverage several source control benefits like the following:
34+
- Ability to track/audit changes.
35+
- Ability to revert changes that introduced bugs.
36+
- **Partial saves:** When authoring against the data factory service, you can't save changes as a draft and all publishes must pass data factory validation. Whether your pipelines are not finished or you simply don't want to lose changes in case of a computer crash, git integration allows for incremental changes of data factory resources regardless of what state they are in. Configuring a git repository allows you to save changes, letting you only publish when you have tested your changes to your satisfaction.
37+
- **Collaboration and control:** If you have multiple team members contributing to the same factory, you may want to let your teammates collaborate with each other via a code review process. You can also set up your factory such that not every contributor has equal permissions. Some team members may only be allowed to make changes via Git and only certain people in the team are allowed to publish the changes to the factory.
38+
- **Better CI/CD:** If you are deploying to multiple environments with a [continuous delivery process](continuous-integration-deployment.md), git integration makes certain actions easier. Some of these actions include:
39+
- Configure your release pipeline to trigger automatically as soon as there are any changes made to your 'dev' factory.
40+
- Customize the properties in your factory that are available as parameters in the Resource Manager template. It can be useful to keep only the required set of properties as parameters, and have everything else hard coded.
41+
- **Better Performance:** An average factory with git integration loads 10 times faster than one authoring against the data factory service. This performance improvement is because resources are downloaded via Git.
3742

3843
> [!NOTE]
3944
> Authoring directly with the Data Factory service is disabled in the Azure Data Factory UX when a Git repository is configured. Changes can be made directly to the service via PowerShell or an SDK.
@@ -73,7 +78,7 @@ The configuration pane shows the following Azure Repos code repository settings:
7378
| **Azure Repos Organization** | Your Azure Repos organization name. You can locate your Azure Repos organization name at `https://{organization name}.visualstudio.com`. You can [sign in to your Azure Repos organization](https://www.visualstudio.com/team-services/git/) to access your Visual Studio profile and see your repositories and projects. | `<your organization name>` |
7479
| **ProjectName** | Your Azure Repos project name. You can locate your Azure Repos project name at `https://{organization name}.visualstudio.com/{project name}`. | `<your Azure Repos project name>` |
7580
| **RepositoryName** | Your Azure Repos code repository name. Azure Repos projects contain Git repositories to manage your source code as your project grows. You can create a new repository or use an existing repository that's already in your project. | `<your Azure Repos code repository name>` |
76-
| **Collaboration branch** | Your Azure Repos collaboration branch that is used for publishing. By default, it's `master`. Change this setting in case you want to publish resources from another branch. | `<your collaboration branch name>` |
81+
| **Collaboration branch** | Your Azure Repos collaboration branch that is used for publishing. By default, its `master`. Change this setting in case you want to publish resources from another branch. | `<your collaboration branch name>` |
7782
| **Root folder** | Your root folder in your Azure Repos collaboration branch. | `<your root folder name>` |
7883
| **Import existing Data Factory resources to repository** | Specifies whether to import existing data factory resources from the UX **Authoring canvas** into an Azure Repos Git repository. Select the box to import your data factory resources into the associated Git repository in JSON format. This action exports each resource individually (that is, the linked services and datasets are exported into separate JSONs). When this box isn't selected, the existing resources aren't imported. | Selected (default) |
7984
| **Branch to import resource into** | Specifies into which branch the data factory resources (pipelines, datasets, linked services etc.) are imported. You can import resources into one of the following branches: a. Collaboration b. Create new c. Use Existing | |
@@ -83,7 +88,7 @@ The configuration pane shows the following Azure Repos code repository settings:
8388
8489
### Use a different Azure Active Directory tenant
8590

86-
You can create an Azure Repos Git repo in a different Azure Active Directory tenant. To specify a different Azure AD tenant, you have to have administrator permissions for the Azure subscription that you're using.
91+
The Azure Repos Git repo can be in a different Azure Active Directory tenant. To specify a different Azure AD tenant, you have to have administrator permissions for the Azure subscription that you're using.
8792

8893
### Use your personal Microsoft account
8994

@@ -140,7 +145,7 @@ The configuration pane shows the following GitHub repository settings:
140145
| **GitHub Enterprise URL** | The GitHub Enterprise root URL (must be HTTPS for local GitHub Enterprise server). For example: https://github.mydomain.com. Required only if **Use GitHub Enterprise** is selected | `<your GitHub enterprise url>` |
141146
| **GitHub account** | Your GitHub account name. This name can be found from https:\//github.com/{account name}/{repository name}. Navigating to this page prompts you to enter GitHub OAuth credentials to your GitHub account. | `<your GitHub account name>` |
142147
| **Repository Name** | Your GitHub code repository name. GitHub accounts contain Git repositories to manage your source code. You can create a new repository or use an existing repository that's already in your account. | `<your repository name>` |
143-
| **Collaboration branch** | Your GitHub collaboration branch that is used for publishing. By default, it's master. Change this setting in case you want to publish resources from another branch. | `<your collaboration branch>` |
148+
| **Collaboration branch** | Your GitHub collaboration branch that is used for publishing. By default, its master. Change this setting in case you want to publish resources from another branch. | `<your collaboration branch>` |
144149
| **Root folder** | Your root folder in your GitHub collaboration branch. |`<your root folder name>` |
145150
| **Import existing Data Factory resources to repository** | Specifies whether to import existing data factory resources from the UX authoring canvas into a GitHub repository. Select the box to import your data factory resources into the associated Git repository in JSON format. This action exports each resource individually (that is, the linked services and datasets are exported into separate JSONs). When this box isn't selected, the existing resources aren't imported. | Selected (default) |
146151
| **Branch to import resource into** | Specifies into which branch the data factory resources (pipelines, datasets, linked services etc.) are imported. You can import resources into one of the following branches: a. Collaboration b. Create new c. Use Existing | |
@@ -155,18 +160,6 @@ The configuration pane shows the following GitHub repository settings:
155160

156161
- A maximum of 1,000 entities per resource type (such as pipelines and datasets) can be fetched from a single GitHub branch. If this limit is reached, is suggested to split your resources into separate factories. Azure DevOps Git does not have this limitation.
157162

158-
## Switch to a different Git repo
159-
160-
To switch to a different Git repo, click the **Git Repo Settings** icon in the upper right corner of the Data Factory overview page. If you can't see the icon, clear your local browser cache. Select the icon to remove the association with the current repo.
161-
162-
![Git icon](media/author-visually/remove-repo.png)
163-
164-
Once the Repository Settings pane appears, select **Remove Git**. Enter your data factory name and click **confirm** to remove the Git repository associated with your data factory.
165-
166-
![Remove the association with the current Git repo](media/author-visually/remove-repo2.png)
167-
168-
After you remove the association with the current repo, you can configure your Git settings to use a different repo and then import existing Data Factory resources to the new repo.
169-
170163
## Version control
171164

172165
Version control systems (also known as _source control_) let developers collaborate on code and track changes that are made to the code base. Source control is an essential tool for multi-developer projects.
@@ -183,15 +176,15 @@ When you are ready to merge the changes from your feature branch to your collabo
183176

184177
### Configure publishing settings
185178

186-
To configure the publish branch - that is, the branch where Resource Manager templates are saved - add a `publish_config.json` file to the root folder in the collaboration branch. Data Factory reads this file, looks for the field `publishBranch`, and creates a new branch (if it doesn't already exist) with the value provided. Then it saves all Resource Manager templates to the specified location. For example:
179+
By default, data factory generates the Resource Manager templates of the published factory and saves them into a branch called `adf_public`. To configure a custom publish branch, add a `publish_config.json` file to the root folder in the collaboration branch. When publishing, ADF reads this file, looks for the field `publishBranch`, and saves all Resource Manager templates to the specified location. If the branch doesn't exist, data factory will automatically create it. And example of what this file looks like is below:
187180

188181
```json
189182
{
190183
"publishBranch": "factory/adf_publish"
191184
}
192185
```
193186

194-
When you specify a new publish branch, Data Factory doesn't delete the previous publish branch. If you want to remove the previous publish branch, delete it manually.
187+
Azure Data Factory can only have one publish branch at a time. When you specify a new publish branch, Data Factory doesn't delete the previous publish branch. If you want to remove the previous publish branch, delete it manually.
195188

196189
> [!NOTE]
197190
> Data Factory only reads the `publish_config.json` file when it loads the factory. If you already have the factory loaded in the portal, refresh the browser to make your changes take effect.
@@ -209,17 +202,6 @@ A side pane will open where you confirm that the publish branch and pending chan
209202
> [!IMPORTANT]
210203
> The master branch is not representative of what's deployed in the Data Factory service. The master branch *must* be published manually to the Data Factory service.
211204
212-
## Advantages of Git integration
213-
214-
- **Source Control**. As your data factory workloads become crucial, you would want to integrate your factory with Git to leverage several source control benefits like the following:
215-
- Ability to track/audit changes.
216-
- Ability to revert changes that introduced bugs.
217-
- **Partial Saves**. As you make a lot of changes in your factory, you will realize that in the regular LIVE mode, you can't save your changes as draft, because you are not ready, or you don't want to lose your changes in case your computer crashes. With Git integration, you can continue saving your changes incrementally, and publish to the factory only when you are ready. Git acts as a staging place for your work, until you have tested your changes to your satisfaction.
218-
- **Collaboration and Control**. If you have multiple team members participating to the same factory, you may want to let your teammates collaborate with each other via a code review process. You can also set up your factory such that not every contributor to the factory has permission to deploy to the factory. Team members may just be allowed to make changes via Git, but only certain people in the team are allowed to "Publish" the changes to the factory.
219-
- **Showing diffs**. In Git mode, you get to see a nice diff of the payload that's about to get published to the factory. This diff shows you all resources/entities that got modified/added/deleted since the last time you published to your factory. Based on this diff, you can either continue further with publishing, or go back and check your changes, and then come back later.
220-
- **Better CI/CD**. If you are using Git mode, you can configure your release pipeline to trigger automatically as soon as there are any changes made in the dev factory. You also get to customize the properties in your factory that are available as parameters in the Resource Manager template. It can be useful to keep only the required set of properties as parameters, and have everything else hard coded.
221-
- **Better Performance**. An average factory loads ten times faster in Git mode than in regular LIVE mode, because the resources are downloaded via Git.
222-
223205
## Best practices for Git integration
224206

225207
### Permissions
@@ -233,9 +215,9 @@ It's recommended to not allow direct check-ins to the collaboration branch. This
233215

234216
### Using passwords from Azure Key Vault
235217

236-
It's recommended to use Azure Key Vault to store any connection strings or passwords for Data Factory Linked Services. For security reasons, we don't store any such secret information in Git, so any changes to Linked Services are published immediately to the Azure Data Factory service.
218+
It's recommended to use Azure Key Vault to store any connection strings or passwords or managed identity authentication for Data Factory Linked Services. For security reasons, data factory doesn't store secrets in Git. Any changes to Linked Services containing secrets such as passwords are published immediately to the Azure Data Factory service.
237219

238-
Using Key Vault also makes continuous integration and deployment easier as you will not have to provide these secrets during Resource Manager template deployment.
220+
Using Key Vault or MSI authentication also makes continuous integration and deployment easier as you won't have to provide these secrets during Resource Manager template deployment.
239221

240222
## Troubleshooting Git integration
241223

@@ -248,15 +230,25 @@ If the publish branch is out of sync with the master branch and contains out-of-
248230
1. Create a pull request to merge the changes to the collaboration branch
249231

250232
Below are some examples of situations that can cause a stale publish branch:
251-
- A user has multiple branches. In one feature branch, they deleted a linked service which is not AKV associated (non AKV linked services are published immediately regardless if they are in Git or not) and never merged the feature branch into the collaboration brnach.
233+
- A user has multiple branches. In one feature branch, they deleted a linked service which is not AKV associated (non-AKV linked services are published immediately regardless if they are in Git or not) and never merged the feature branch into the collaboration branch.
252234
- A user modified the data factory using the SDK or PowerShell
253235
- A user moved all resources to a new branch and tried to publish for the first time. Linked services should be created manually when importing resources.
254-
- A user uploads a non AKV linked service or an Integration Runtime JSON manually. They reference that resource from another resource such as a dataset, linked service, or pipeline. A non-AKV linked service created through the UX is published immediately becausethe credentials need to be encrypted. If you upload a dataset referencing that linked service and try to publish, the UX will allow it because it exists in the git environment. It will be rejected at publish time since it does not exist in the data factory service.
236+
- A user uploads a non-AKV linked service or an Integration Runtime JSON manually. They reference that resource from another resource such as a dataset, linked service, or pipeline. A non-AKV linked service created through the UX is published immediately because the credentials need to be encrypted. If you upload a dataset referencing that linked service and try to publish, the UX will allow it because it exists in the git environment. It will be rejected at publish time since it does not exist in the data factory service.
237+
238+
## Switch to a different Git repository
239+
240+
To switch to a different Git repository, click the **Git Repo Settings** icon in the upper right corner of the Data Factory overview page. If you can't see the icon, clear your local browser cache. Select the icon to remove the association with the current repo.
255241

256-
## Provide feedback
257-
Select **Feedback** to comment about features or to notify Microsoft about issues with the tool:
242+
![Git icon](media/author-visually/remove-repo.png)
243+
244+
Once the Repository Settings pane appears, select **Remove Git**. Enter your data factory name and click **confirm** to remove the Git repository associated with your data factory.
258245

259-
![Feedback](media/author-visually/provide-feedback.png)
246+
![Remove the association with the current Git repo](media/author-visually/remove-repo2.png)
247+
248+
After you remove the association with the current repo, you can configure your Git settings to use a different repo and then import existing Data Factory resources to the new repo.
249+
250+
> [!IMPORTANT]
251+
> Removing Git configuration from a data factory doesn't delete anything from the repository. The factory will contain all published resources. You can continue to edit the factory directly against the service.
260252
261253
## Next steps
262254

0 commit comments

Comments
 (0)