Skip to content

Commit 65ce003

Browse files
Merge pull request #252117 from jonburchel/2023-09-19-sync-git-repository-with-airflow
2023 09 19 sync git repository with airflow
2 parents 1281bc3 + 2a11775 commit 65ce003

File tree

8 files changed

+291
-0
lines changed

8 files changed

+291
-0
lines changed

articles/data-factory/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1330,6 +1330,8 @@ items:
13301330
href: kubernetes-secret-pull-image-from-private-container-registry.md
13311331
- name: Rest APIs for the Airflow integrated runtime
13321332
href: rest-apis-for-airflow-integrated-runtime.md
1333+
- name: Sync a GitHub repository with Airflow
1334+
href: airflow-sync-github-repository.md
13331335
- name: Pricing
13341336
href: airflow-pricing.md
13351337
- name: Reference
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
title: Sync a GitHub repository with Managed Airflow
3+
description: This article provides step-by-step instructions for how to sync a GitHub repository using Managed Airflow in Azure Data Factory.
4+
author: nabhishek
5+
ms.author: abnarain
6+
ms.reviewer: jburchel
7+
ms.service: data-factory
8+
ms.topic: how-to
9+
ms.date: 09/19/2023
10+
---
11+
12+
# Sync a GitHub repository with Managed Airflow in Azure Data Factory
13+
14+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
15+
16+
While you can certainly manually create and update Directed Acyclic Graph (DAG) files for Azure Managed Apache Airflow using the Azure Storage or using the [Azure CLI](/azure/storage/blobs/storage-quickstart-blobs-cli), many organizations prefer to streamline their processes using a Continuous Integration and Continuous Delivery (CI/CD) approach. In this scenario, each commit made to the source code repository triggers an automated workflow that synchronizes the code with the designated DAGs folder within Azure Managed Apache Airflow.
17+
18+
In this guide, you will learn how to synchronize your GitHub repository in Managed Airflow in two different ways.
19+
20+
- Using the Git Sync feature in the Managed Airflow UI
21+
- Using the Rest API
22+
23+
## Prerequisites
24+
25+
- **Azure subscription** - If you don't have an Azure subscription, create a [free Azure account](https://azure.microsoft.com/free/) before you begin. Create or select an existing [Data Factory](https://azure.microsoft.com/products/data-factory#get-started) in a [region where the Managed Airflow preview is supported](concept-managed-airflow.md#region-availability-public-preview).
26+
- **Access to a GitHub repository**
27+
28+
## Using the Managed Airflow UI
29+
30+
The following steps describe how to sync your GitHub repository using Managed Airflow UI:
31+
32+
1. Ensure your repository contains the necessary folders and files.
33+
- **Dags/** - for Apache Airflow Dags (required)
34+
- **Plugins/** - for integrating external features to Airflow.
35+
:::image type="content" source="media/airflow-git-sync-repository/airflow-folders.png" alt-text="Screenshot showing the Airflow folders structure in GitHub.":::
36+
37+
1. While creating an Airflow integrated runtime (IR), select **Enable git sync** on the Airflow environment setup dialog.
38+
39+
:::image type="content" source="media/airflow-git-sync-repository/enable-git-sync.png" alt-text="Screenshot showing the Enable git sync checkbox on the Airflow environment setup dialog that appears during creation of an Airflow IR.":::
40+
41+
1. Select one of the following supported git service types:
42+
- GitHub
43+
- ADO
44+
- GitLab
45+
- Bitbucket
46+
47+
:::image type="content" source="media/airflow-git-sync-repository/git-service-type.png" alt-text="Screenshot showing the Git service type selection dropdown on the Airflow environment setup dialog that appears during creation of an Airflow IR.":::
48+
49+
1. Select credential type:
50+
51+
- **None** (for a public repo)
52+
When you select this option, make sure to make your repository’s visibility is public. Once you select this option, fill out the details:
53+
- **Git Repo URL** (required): The clone URL for your desired GitHub repository
54+
- **Git branch** (required): The current branch, where your desired git repository is located
55+
- **PAT** (Personal Access Token)
56+
Once you select this option, fill out the remaining fields based upon on the selected Git Service type:
57+
- GitHub Personal Access Token
58+
- ADO Personal Access Token
59+
- GitLab Personal Access Token
60+
- Bitbucket Personal Access Token
61+
:::image type="content" source="media/airflow-git-sync-repository/git-pat-credentials.png" alt-text="Screenshot showing the Git PAT credential options on the Airflow environment setup dialog that appears during creation of an Airflow IR.":::
62+
- **SPN** ([Service Principal Name](https://devblogs.microsoft.com/devops/introducing-service-principal-and-managed-identity-support-on-azure-devops/) - Only ADO supports this credential type.)
63+
Once you select this option, fill out the remaining fields based upon on the selected **Git service type**:
64+
- **Git repo URL** (Required): The clone URL to the git repository to sync
65+
- **Git branch** (Required): The branch in the repository to sync
66+
- **Service principal app id** (Required): The service principal app id with access to the ADO repo to sync
67+
- **Service principal secret** (Required): A manually generated secret in service principal whose value is to be used to authenticate and access the ADO repo
68+
- **Service principal tenant id** (Required): The service principal tenant id
69+
:::image type="content" source="media/airflow-git-sync-repository/git-spn-credentials.png" alt-text="Screenshot showing the Git SPN credential options on the Airflow environment setup dialog that appears during creation of an Airflow IR.":::
70+
71+
1. Fill in the rest of the fields with the required information.
72+
1. Select Create.
73+
74+
## Using the REST API
75+
76+
The following steps describe how to sync your GitHub repository using the Rest APIs:
77+
78+
- **Method**: PUT
79+
- **URL**: ```https://management.azure.com/subscriptions/<subscriptionid>/resourcegroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<datafactoryName>/integrationruntimes/<airflowEnvName>?api-version=2018-06-01```
80+
- **URI parameters**:
81+
82+
|Name |In |Required |Type |Description |
83+
|---------|---------|---------|---------|---------|
84+
|Subscription Id | path | True | string | Subscription identifier |
85+
|ResourceGroup Name | path | True | string | Resource group name (Regex pattern: ```^[-\w\._\(\)]+$```) |
86+
|dataFactoryName | path | True | string | Name of the Azure Data Factory (Regex pattern: ```^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*$``` |
87+
|airflowEnvName | path | True | string | Name of the Managed Airflow environment |
88+
|Api-version | query | True | string | The API version |
89+
90+
- **Request body (Airflow configuration)**:
91+
92+
|Name |Type |Description |
93+
|---------|---------|---------|
94+
|name |string |Name of the Airflow environment |
95+
|properties |propertyType |Configuration properties for the environment |
96+
97+
- **Properties type**:
98+
99+
|Name |Type |Description |
100+
|---------|---------|---------|
101+
|Type |string |The resource type (**Airflow** in this scenario) |
102+
|typeProperties |typeProperty |Airflow |
103+
104+
- **Type property**
105+
106+
|Name |Type |Description |
107+
|---------|---------|---------|
108+
|computeProperties |computeProperty |Configuration of the compute type used for the environment. |
109+
|airflowProperties |airflowProperty |Configuration of the Airflow properties for the environment. |
110+
111+
- **Compute property**
112+
113+
|Name |Type |Description |
114+
|---------|---------|---------|
115+
|location |string |The Airflow integrated runtime location defaults to the data factory region. To create an integrated runtime in a different region, create a new data factory in the required region. |
116+
| computeSize | string |The size of the compute node you want your Airflow environment to run on. Example: “Large”, “Small”. 3 nodes are allocated initially. |
117+
| extraNodes | integer |Each extra node adds 3 more workers. |
118+
119+
- **Airflow property**
120+
121+
|Name |Type |Description |
122+
|---------|---------|---------|
123+
|airflowVersion | string | Current version of Airflow (Example: 2.4.3) |
124+
|airflowRequirements | Array\<string\> | Python libraries you wish to use. Example: ["flask-bcrypy=0.7.1"]. Can be a comma delimited list. |
125+
|airflowEnvironmentVariables | Object (Key/Value pair) | Environment variables you wish to use. Example: { “SAMPLE_ENV_NAME”: “test” } |
126+
|gitSyncProperties | gitSyncProperty | Git configuration properties |
127+
|enableAADIntegration | boolean | Allows Azure AD to login to Airflow |
128+
|userName | string or null | Username for Basic Authentication |
129+
|password | string or null | Password for Basic Authentication |
130+
131+
- **Git sync property**
132+
133+
|Name |Type |Description |
134+
|---------|---------|---------|
135+
|gitServiceType | string | The Git service your desired repo is located in. Values: GitHub, AOD, GitLab, or BitBucket |
136+
|gitCredentialType | string | Type of Git credential. Values: PAT (for Personal Access Token), SPN (supported only by ADO), None |
137+
|repo | string | Repository link |
138+
|branch | string | Branch to use in the repository |
139+
|username | string | GitHub username |
140+
|Credential | string | Value of the Personal Access Token |
141+
|tenantId | string | The service principal tenant id (supported only by ADO) |
142+
143+
- **Responses**
144+
145+
|Name |Status code |Type |Description |
146+
|---------|---------|---------|----------|
147+
|Accepted | 200 | [Factory](/rest/api/datafactory/factories/get?tabs=HTTP#factory) | OK |
148+
|Unauthorized | 401 | [Cloud Error](/rest/api/datafactory/factories/get?tabs=HTTP#clouderror) | Array with additional error details |
149+
150+
### Examples
151+
152+
Sample request:
153+
154+
```rest
155+
HTTP
156+
PUT https://management.azure.com/subscriptions/222f1459-6ebd-4896-82ab-652d5f6883cf/resourcegroups/abnarain-rg/providers/Microsoft.DataFactory/factories/ambika-df/integrationruntimes/sample-2?api-version=2018-06-01
157+
```
158+
159+
Sample Body:
160+
161+
```rest
162+
{
163+
"name": "sample-2",
164+
"properties": {
165+
"type": "Airflow",
166+
"typeProperties": {
167+
"computeProperties": {
168+
"location": "East US",
169+
"computeSize": "Large",
170+
"extraNodes": 0
171+
},
172+
"airflowProperties": {
173+
"airflowVersion": "2.4.3",
174+
"airflowEnvironmentVariables": {
175+
"AIRFLOW__TEST__TEST": "test"
176+
},
177+
"airflowRequirements": [
178+
"apache-airflow-providers-microsoft-azure"
179+
],
180+
"enableAADIntegration": true,
181+
"userName": null,
182+
"password": null,
183+
"airflowEntityReferences": []
184+
}
185+
}
186+
}
187+
}
188+
```
189+
190+
Sample Response:
191+
192+
```rest
193+
Status code: 200 OK
194+
```
195+
196+
Response Body:
197+
198+
```rest
199+
{
200+
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/your-rg/providers/Microsoft.DataFactory/factories/your-df/integrationruntimes/sample-2",
201+
"name": "sample-2",
202+
"type": "Microsoft.DataFactory/factories/integrationruntimes",
203+
"properties": {
204+
"type": "Airflow",
205+
"typeProperties": {
206+
"computeProperties": {
207+
"location": "East US",
208+
"computeSize": "Large",
209+
"extraNodes": 0
210+
},
211+
"airflowProperties": {
212+
"airflowVersion": "2.4.3",
213+
"pythonVersion": "3.8",
214+
"airflowEnvironmentVariables": {
215+
"AIRFLOW__TEST__TEST": "test"
216+
},
217+
"airflowWebUrl": "https://e57f7409041692.eastus.airflow.svc.datafactory.azure.com/login/",
218+
"airflowRequirements": [
219+
"apache-airflow-providers-microsoft-azure"
220+
],
221+
"airflowEntityReferences": [],
222+
"packageProviderPath": "plugins",
223+
"enableAADIntegration": true,
224+
"enableTriggerers": false
225+
}
226+
},
227+
"state": "Initial"
228+
},
229+
"etag": "3402279e-0000-0100-0000-64ecb1cb0000"
230+
}
231+
```
232+
233+
Here are some API payload examples:
234+
235+
- Git sync properties for Github with PAT:
236+
```rest
237+
"gitSyncProperties": {
238+
"gitServiceType": "Github",
239+
"gitCredentialType": "PAT",
240+
"repo": <repo url>,
241+
"branch": <repo branch to sync>,
242+
"username": <username>,
243+
"credential": <personal access token>
244+
}
245+
```
246+
247+
- Git sync properties for ADO with PAT:
248+
```rest
249+
"gitSyncProperties": {
250+
"gitServiceType": "ADO",
251+
"gitCredentialType": "PAT",
252+
"repo": <repo url>,
253+
"branch": <repo branch to sync>,
254+
"username": <username>,
255+
"credential": <personal access token>
256+
}```
257+
258+
- Git sync properties for ADO with Service Principal:
259+
```rest
260+
"gitSyncProperties": {
261+
"gitServiceType": "ADO",
262+
"gitCredentialType": "SPN",
263+
"repo": <repo url>,
264+
"branch": <repo branch to sync>,
265+
"username": < service principal app id >,
266+
"credential": <service principal secret value>
267+
"tenantId": <service principal tenant id>
268+
}```
269+
270+
- Git sync properties for Github public repo:
271+
```rest
272+
"gitSyncProperties": {
273+
"gitServiceType": "Github",
274+
"gitCredentialType": "None",
275+
"repo": <repo url>,
276+
"branch": <repo branch to sync>
277+
}```
278+
279+
## Importing a private package with git-sync (Optional - only applies when using private packages)
280+
281+
Assuming your private package has already been auto synced via git-sync, all you need to do is add the package as a requirement in the data factory Airflow UI along with the path prefix _/opt/airflow/git/\<repoName\>/__ if you are connecting to an ADO repo or _/opt/airflow/git/\<repoName\>.git/_ for all other git services. For example, if your private package is in _/dags/test/private.whl_ in a GitHub repo, then you should add the requirement _/opt/airflow/git/\<repoName\>.git/dags/test/private.whl_ to the Airflow environment.
282+
283+
:::image type="content" source="media/airflow-git-sync-repository/airflow-private-package.png" alt-text="Screenshot showing the Airflow requirements section on the Airflow environment setup dialog that appears during creation of an Airflow IR.":::
284+
285+
## Next steps
286+
287+
- [Run an existing pipeline with Managed Airflow](tutorial-run-existing-pipeline-with-airflow.md)
288+
- [Managed Airflow pricing](airflow-pricing.md)
289+
- [How to change the password for Managed Airflow environments](password-change-airflow.md)
68.8 KB
Loading
101 KB
Loading
51.9 KB
Loading
21.1 KB
Loading
86.8 KB
Loading
19.8 KB
Loading

0 commit comments

Comments
 (0)