|
| 1 | +--- |
| 2 | +title: Sync a GitHub repository with Managed Airflow |
| 3 | +description: This article provides step-by-step instructions for how to sync a GitHub repository using Managed Airflow in Azure Data Factory. |
| 4 | +author: nabhishek |
| 5 | +ms.author: abnarain |
| 6 | +ms.reviewer: jburchel |
| 7 | +ms.service: data-factory |
| 8 | +ms.topic: how-to |
| 9 | +ms.date: 09/19/2023 |
| 10 | +--- |
| 11 | + |
| 12 | +# Sync a GitHub repository with Managed Airflow in Azure Data Factory |
| 13 | + |
| 14 | +[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)] |
| 15 | + |
| 16 | +While you can certainly manually create and update Directed Acyclic Graph (DAG) files for Azure Managed Apache Airflow using the Azure Storage or using the [Azure CLI](/azure/storage/blobs/storage-quickstart-blobs-cli), many organizations prefer to streamline their processes using a Continuous Integration and Continuous Delivery (CI/CD) approach. In this scenario, each commit made to the source code repository triggers an automated workflow that synchronizes the code with the designated DAGs folder within Azure Managed Apache Airflow. |
| 17 | + |
| 18 | +In this guide, you will learn how to synchronize your GitHub repository in Managed Airflow in two different ways. |
| 19 | + |
| 20 | +- Using the Git Sync feature in the Managed Airflow UI |
| 21 | +- Using the Rest API |
| 22 | + |
| 23 | +## Prerequisites |
| 24 | + |
| 25 | +- **Azure subscription** - If you don't have an Azure subscription, create a [free Azure account](https://azure.microsoft.com/free/) before you begin. Create or select an existing [Data Factory](https://azure.microsoft.com/products/data-factory#get-started) in a [region where the Managed Airflow preview is supported](concept-managed-airflow.md#region-availability-public-preview). |
| 26 | +- **Access to a GitHub repository** |
| 27 | + |
| 28 | +## Using the Managed Airflow UI |
| 29 | + |
| 30 | +The following steps describe how to sync your GitHub repository using Managed Airflow UI: |
| 31 | + |
| 32 | +1. Ensure your repository contains the necessary folders and files. |
| 33 | + - **Dags/** - for Apache Airflow Dags (required) |
| 34 | + - **Plugins/** - for integrating external features to Airflow. |
| 35 | + :::image type="content" source="media/airflow-git-sync-repository/airflow-folders.png" alt-text="Screenshot showing the Airflow folders structure in GitHub."::: |
| 36 | + |
| 37 | +1. While creating an Airflow integrated runtime (IR), select **Enable git sync** on the Airflow environment setup dialog. |
| 38 | + |
| 39 | + :::image type="content" source="media/airflow-git-sync-repository/enable-git-sync.png" alt-text="Screenshot showing the Enable git sync checkbox on the Airflow environment setup dialog that appears during creation of an Airflow IR."::: |
| 40 | + |
| 41 | +1. Select one of the following supported git service types: |
| 42 | + - GitHub |
| 43 | + - ADO |
| 44 | + - GitLab |
| 45 | + - Bitbucket |
| 46 | + |
| 47 | + :::image type="content" source="media/airflow-git-sync-repository/git-service-type.png" alt-text="Screenshot showing the Git service type selection dropdown on the Airflow environment setup dialog that appears during creation of an Airflow IR."::: |
| 48 | + |
| 49 | +1. Select credential type: |
| 50 | + |
| 51 | + - **None** (for a public repo) |
| 52 | + When you select this option, make sure to make your repository’s visibility is public. Once you select this option, fill out the details: |
| 53 | + - **Git Repo URL** (required): The clone URL for your desired GitHub repository |
| 54 | + - **Git branch** (required): The current branch, where your desired git repository is located |
| 55 | + - **PAT** (Personal Access Token) |
| 56 | + Once you select this option, fill out the remaining fields based upon on the selected Git Service type: |
| 57 | + - GitHub Personal Access Token |
| 58 | + - ADO Personal Access Token |
| 59 | + - GitLab Personal Access Token |
| 60 | + - Bitbucket Personal Access Token |
| 61 | + :::image type="content" source="media/airflow-git-sync-repository/git-pat-credentials.png" alt-text="Screenshot showing the Git PAT credential options on the Airflow environment setup dialog that appears during creation of an Airflow IR."::: |
| 62 | + - **SPN** ([Service Principal Name](https://devblogs.microsoft.com/devops/introducing-service-principal-and-managed-identity-support-on-azure-devops/) - Only ADO supports this credential type.) |
| 63 | + Once you select this option, fill out the remaining fields based upon on the selected **Git service type**: |
| 64 | + - **Git repo URL** (Required): The clone URL to the git repository to sync |
| 65 | + - **Git branch** (Required): The branch in the repository to sync |
| 66 | + - **Service principal app id** (Required): The service principal app id with access to the ADO repo to sync |
| 67 | + - **Service principal secret** (Required): A manually generated secret in service principal whose value is to be used to authenticate and access the ADO repo |
| 68 | + - **Service principal tenant id** (Required): The service principal tenant id |
| 69 | + :::image type="content" source="media/airflow-git-sync-repository/git-spn-credentials.png" alt-text="Screenshot showing the Git SPN credential options on the Airflow environment setup dialog that appears during creation of an Airflow IR."::: |
| 70 | + |
| 71 | +1. Fill in the rest of the fields with the required information. |
| 72 | +1. Select Create. |
| 73 | + |
| 74 | +## Using the REST API |
| 75 | + |
| 76 | +The following steps describe how to sync your GitHub repository using the Rest APIs: |
| 77 | + |
| 78 | +- **Method**: PUT |
| 79 | +- **URL**: ```https://management.azure.com/subscriptions/<subscriptionid>/resourcegroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<datafactoryName>/integrationruntimes/<airflowEnvName>?api-version=2018-06-01``` |
| 80 | +- **URI parameters**: |
| 81 | + |
| 82 | + |Name |In |Required |Type |Description | |
| 83 | + |---------|---------|---------|---------|---------| |
| 84 | + |Subscription Id | path | True | string | Subscription identifier | |
| 85 | + |ResourceGroup Name | path | True | string | Resource group name (Regex pattern: ```^[-\w\._\(\)]+$```) | |
| 86 | + |dataFactoryName | path | True | string | Name of the Azure Data Factory (Regex pattern: ```^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*$``` | |
| 87 | + |airflowEnvName | path | True | string | Name of the Managed Airflow environment | |
| 88 | + |Api-version | query | True | string | The API version | |
| 89 | + |
| 90 | +- **Request body (Airflow configuration)**: |
| 91 | + |
| 92 | + |Name |Type |Description | |
| 93 | + |---------|---------|---------| |
| 94 | + |name |string |Name of the Airflow environment | |
| 95 | + |properties |propertyType |Configuration properties for the environment | |
| 96 | + |
| 97 | +- **Properties type**: |
| 98 | + |
| 99 | + |Name |Type |Description | |
| 100 | + |---------|---------|---------| |
| 101 | + |Type |string |The resource type (**Airflow** in this scenario) | |
| 102 | + |typeProperties |typeProperty |Airflow | |
| 103 | + |
| 104 | +- **Type property** |
| 105 | + |
| 106 | + |Name |Type |Description | |
| 107 | + |---------|---------|---------| |
| 108 | + |computeProperties |computeProperty |Configuration of the compute type used for the environment. | |
| 109 | + |airflowProperties |airflowProperty |Configuration of the Airflow properties for the environment. | |
| 110 | + |
| 111 | +- **Compute property** |
| 112 | + |
| 113 | + |Name |Type |Description | |
| 114 | + |---------|---------|---------| |
| 115 | + |location |string |The Airflow integrated runtime location defaults to the data factory region. To create an integrated runtime in a different region, create a new data factory in the required region. | |
| 116 | + | computeSize | string |The size of the compute node you want your Airflow environment to run on. Example: “Large”, “Small”. 3 nodes are allocated initially. | |
| 117 | + | extraNodes | integer |Each extra node adds 3 more workers. | |
| 118 | + |
| 119 | +- **Airflow property** |
| 120 | + |
| 121 | + |Name |Type |Description | |
| 122 | + |---------|---------|---------| |
| 123 | + |airflowVersion | string | Current version of Airflow (Example: 2.4.3) | |
| 124 | + |airflowRequirements | Array\<string\> | Python libraries you wish to use. Example: ["flask-bcrypy=0.7.1"]. Can be a comma delimited list. | |
| 125 | + |airflowEnvironmentVariables | Object (Key/Value pair) | Environment variables you wish to use. Example: { “SAMPLE_ENV_NAME”: “test” } | |
| 126 | + |gitSyncProperties | gitSyncProperty | Git configuration properties | |
| 127 | + |enableAADIntegration | boolean | Allows Azure AD to login to Airflow | |
| 128 | + |userName | string or null | Username for Basic Authentication | |
| 129 | + |password | string or null | Password for Basic Authentication | |
| 130 | + |
| 131 | +- **Git sync property** |
| 132 | + |
| 133 | + |Name |Type |Description | |
| 134 | + |---------|---------|---------| |
| 135 | + |gitServiceType | string | The Git service your desired repo is located in. Values: GitHub, AOD, GitLab, or BitBucket | |
| 136 | + |gitCredentialType | string | Type of Git credential. Values: PAT (for Personal Access Token), SPN (supported only by ADO), None | |
| 137 | + |repo | string | Repository link | |
| 138 | + |branch | string | Branch to use in the repository | |
| 139 | + |username | string | GitHub username | |
| 140 | + |Credential | string | Value of the Personal Access Token | |
| 141 | + |tenantId | string | The service principal tenant id (supported only by ADO) | |
| 142 | + |
| 143 | +- **Responses** |
| 144 | + |
| 145 | + |Name |Status code |Type |Description | |
| 146 | + |---------|---------|---------|----------| |
| 147 | + |Accepted | 200 | [Factory](/rest/api/datafactory/factories/get?tabs=HTTP#factory) | OK | |
| 148 | + |Unauthorized | 401 | [Cloud Error](/rest/api/datafactory/factories/get?tabs=HTTP#clouderror) | Array with additional error details | |
| 149 | + |
| 150 | +### Examples |
| 151 | + |
| 152 | +Sample request: |
| 153 | + |
| 154 | +```rest |
| 155 | +HTTP |
| 156 | +PUT https://management.azure.com/subscriptions/222f1459-6ebd-4896-82ab-652d5f6883cf/resourcegroups/abnarain-rg/providers/Microsoft.DataFactory/factories/ambika-df/integrationruntimes/sample-2?api-version=2018-06-01 |
| 157 | +``` |
| 158 | + |
| 159 | +Sample Body: |
| 160 | + |
| 161 | +```rest |
| 162 | +{ |
| 163 | + "name": "sample-2", |
| 164 | + "properties": { |
| 165 | + "type": "Airflow", |
| 166 | + "typeProperties": { |
| 167 | + "computeProperties": { |
| 168 | + "location": "East US", |
| 169 | + "computeSize": "Large", |
| 170 | + "extraNodes": 0 |
| 171 | + }, |
| 172 | + "airflowProperties": { |
| 173 | + "airflowVersion": "2.4.3", |
| 174 | + "airflowEnvironmentVariables": { |
| 175 | + "AIRFLOW__TEST__TEST": "test" |
| 176 | + }, |
| 177 | + "airflowRequirements": [ |
| 178 | + "apache-airflow-providers-microsoft-azure" |
| 179 | + ], |
| 180 | + "enableAADIntegration": true, |
| 181 | + "userName": null, |
| 182 | + "password": null, |
| 183 | + "airflowEntityReferences": [] |
| 184 | + } |
| 185 | + } |
| 186 | + } |
| 187 | +} |
| 188 | +``` |
| 189 | + |
| 190 | +Sample Response: |
| 191 | + |
| 192 | +```rest |
| 193 | +Status code: 200 OK |
| 194 | +``` |
| 195 | + |
| 196 | +Response Body: |
| 197 | + |
| 198 | +```rest |
| 199 | +{ |
| 200 | + "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/your-rg/providers/Microsoft.DataFactory/factories/your-df/integrationruntimes/sample-2", |
| 201 | + "name": "sample-2", |
| 202 | + "type": "Microsoft.DataFactory/factories/integrationruntimes", |
| 203 | + "properties": { |
| 204 | + "type": "Airflow", |
| 205 | + "typeProperties": { |
| 206 | + "computeProperties": { |
| 207 | + "location": "East US", |
| 208 | + "computeSize": "Large", |
| 209 | + "extraNodes": 0 |
| 210 | + }, |
| 211 | + "airflowProperties": { |
| 212 | + "airflowVersion": "2.4.3", |
| 213 | + "pythonVersion": "3.8", |
| 214 | + "airflowEnvironmentVariables": { |
| 215 | + "AIRFLOW__TEST__TEST": "test" |
| 216 | + }, |
| 217 | + "airflowWebUrl": "https://e57f7409041692.eastus.airflow.svc.datafactory.azure.com/login/", |
| 218 | + "airflowRequirements": [ |
| 219 | + "apache-airflow-providers-microsoft-azure" |
| 220 | + ], |
| 221 | + "airflowEntityReferences": [], |
| 222 | + "packageProviderPath": "plugins", |
| 223 | + "enableAADIntegration": true, |
| 224 | + "enableTriggerers": false |
| 225 | + } |
| 226 | + }, |
| 227 | + "state": "Initial" |
| 228 | + }, |
| 229 | + "etag": "3402279e-0000-0100-0000-64ecb1cb0000" |
| 230 | +} |
| 231 | +``` |
| 232 | + |
| 233 | +Here are some API payload examples: |
| 234 | + |
| 235 | +- Git sync properties for Github with PAT: |
| 236 | + ```rest |
| 237 | + "gitSyncProperties": { |
| 238 | + "gitServiceType": "Github", |
| 239 | + "gitCredentialType": "PAT", |
| 240 | + "repo": <repo url>, |
| 241 | + "branch": <repo branch to sync>, |
| 242 | + "username": <username>, |
| 243 | + "credential": <personal access token> |
| 244 | + } |
| 245 | + ``` |
| 246 | + |
| 247 | +- Git sync properties for ADO with PAT: |
| 248 | + ```rest |
| 249 | + "gitSyncProperties": { |
| 250 | + "gitServiceType": "ADO", |
| 251 | + "gitCredentialType": "PAT", |
| 252 | + "repo": <repo url>, |
| 253 | + "branch": <repo branch to sync>, |
| 254 | + "username": <username>, |
| 255 | + "credential": <personal access token> |
| 256 | + }``` |
| 257 | + |
| 258 | +- Git sync properties for ADO with Service Principal: |
| 259 | + ```rest |
| 260 | + "gitSyncProperties": { |
| 261 | + "gitServiceType": "ADO", |
| 262 | + "gitCredentialType": "SPN", |
| 263 | + "repo": <repo url>, |
| 264 | + "branch": <repo branch to sync>, |
| 265 | + "username": < service principal app id >, |
| 266 | + "credential": <service principal secret value> |
| 267 | + "tenantId": <service principal tenant id> |
| 268 | + }``` |
| 269 | + |
| 270 | +- Git sync properties for Github public repo: |
| 271 | + ```rest |
| 272 | + "gitSyncProperties": { |
| 273 | + "gitServiceType": "Github", |
| 274 | + "gitCredentialType": "None", |
| 275 | + "repo": <repo url>, |
| 276 | + "branch": <repo branch to sync> |
| 277 | + }``` |
| 278 | +
|
| 279 | +## Importing a private package with git-sync (Optional - only applies when using private packages) |
| 280 | +
|
| 281 | +Assuming your private package has already been auto synced via git-sync, all you need to do is add the package as a requirement in the data factory Airflow UI along with the path prefix _/opt/airflow/git/\<repoName\>/__ if you are connecting to an ADO repo or _/opt/airflow/git/\<repoName\>.git/_ for all other git services. For example, if your private package is in _/dags/test/private.whl_ in a GitHub repo, then you should add the requirement _/opt/airflow/git/\<repoName\>.git/dags/test/private.whl_ to the Airflow environment. |
| 282 | +
|
| 283 | +:::image type="content" source="media/airflow-git-sync-repository/airflow-private-package.png" alt-text="Screenshot showing the Airflow requirements section on the Airflow environment setup dialog that appears during creation of an Airflow IR."::: |
| 284 | +
|
| 285 | +## Next steps |
| 286 | +
|
| 287 | +- [Run an existing pipeline with Managed Airflow](tutorial-run-existing-pipeline-with-airflow.md) |
| 288 | +- [Managed Airflow pricing](airflow-pricing.md) |
| 289 | +- [How to change the password for Managed Airflow environments](password-change-airflow.md) |
0 commit comments