Skip to content

Commit 224603e

Browse files
authored
Merge pull request #224903 from jonburchel/2023-01-24-airflow-docs-pages-11-16-of-original-word-doc
Airflow docs
2 parents e61c659 + 0bf23f1 commit 224603e

26 files changed

+501
-2
lines changed

articles/data-factory/TOC.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1291,6 +1291,24 @@ items:
12911291
- name: Diagnose connectivity in Azure-SSIS IR
12921292
href: ssis-integration-runtime-diagnose-connectivity-faq.md
12931293
displayName: timeout, troubleshooting
1294+
- name: Managed Airflow
1295+
items:
1296+
- name: Tutorials
1297+
items:
1298+
- name: Run an existing pipeline with Airflow
1299+
href: tutorial-run-existing-pipeline-with-airflow.md
1300+
- name: Refresh a Power BI dataset with Airflow
1301+
href: tutorial-refresh-power-bi-dataset-with-airflow.md
1302+
- name: Concepts
1303+
href: concept-managed-airflow.md
1304+
- name: How-to
1305+
items:
1306+
- name: How does Managed Airflow work
1307+
href: how-does-managed-airflow-work.md
1308+
- name: How to change the Airflow password
1309+
href: password-change-airflow.md
1310+
- name: Pricing
1311+
href: airflow-pricing.md
12941312
- name: Reference
12951313
items:
12961314
- name: Data flow script
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: Managed Airflow pricing
3+
description: This article describes the pricing for Managed Airflow.
4+
author: nabhishek
5+
ms.service: data-factory
6+
ms.subservice: pricing
7+
ms.topic: conceptual
8+
ms.date: 01/24/2023
9+
ms.author: abnarain
10+
---
11+
12+
# Managed Airflow pricing
13+
14+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
15+
16+
This article describes the pricing for Managed Airflow usage within data factory.
17+
18+
## Pricing details
19+
20+
Managed Airflow supports either small (D2v4) or large (D4v4) node sizing. Small can support up to 50 DAGs simultaneously, and large can support up to 1000 DAGs. The following table describes pricing for each option:
21+
22+
:::image type="content" source="media/airflow-pricing/airflow-pricing.png" alt-text="Shows a screenshot of a table of pricing options for Managed Airflow configuration.":::
23+
24+
## Next steps
25+
26+
- [Run an existing pipeline with Managed Airflow](tutorial-run-existing-pipeline-with-airflow.md)
27+
- [Refresh a Power BI dataset with Managed Airflow](tutorial-refresh-power-bi-dataset-with-airflow.md)
28+
- [Changing password for Airflow environments](password-change-airflow.md)
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
title: What is Managed Airflow?
3+
titleSuffix: Azure Data Factory
4+
description: Learn about when to use Managed Airflow, basic concepts and supported regions.
5+
ms.service: data-factory
6+
ms.topic: conceptual
7+
author: nabhishek
8+
ms.author: abnarain
9+
ms.date: 01/20/2023
10+
ms.custom: references_regions
11+
---
12+
13+
# What is Azure Data Factory Managed Airflow?
14+
15+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
16+
17+
> [!NOTE]
18+
> This feature is in public preview. For questions or feature suggestions, please send an email to mailto:[email protected] with the details.
19+
20+
> [!NOTE]
21+
> Managed Airflow for Azure Data Factory relies on the open source Apache Airflow application. Documentation and more tutorials for Airflow can be found on the Apache Airflow [Documentation](https://airflow.apache.org/docs/) or [Community](https://airflow.apache.org/community/) pages.
22+
23+
Azure Data Factory offers serverless pipelines for data process orchestration, data movement with 100+ managed connectors, and visual transformations with the mapping data flow.
24+
25+
Managed Airflow in Azure Data Factory is a managed orchestration service for [Apache Airflow](https://airflow.apache.org/) that simplifies the creation and management of Airflow environments on which you can operate end-to-end data pipelines at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as "workflows." With Managed Airflow in Azure Data Factory, you can use Airflow and Python to create data workflows without managing the underlying infrastructure for scalability, availability, and security.
26+
27+
:::image type="content" source="media/concept-managed-airflow/data-integration.png" alt-text="Screenshot shows data integration.":::
28+
29+
## When to use Managed Airflow?
30+
31+
Azure Data Factory offers [Pipelines](concepts-pipelines-activities.md) to visually orchestrate data processes (UI-based authoring). While Managed Airflow, offers Airflow based python DAGs (python code-centric authoring) for defining the data orchestration process. If you have the Airflow background, or are currently using Apace Airflow, you may prefer to use the Managed Airflow instead of the pipelines. On the contrary, if you wouldn't like to write/ manage python-based DAGs for data process orchestration, you may prefer to use pipelines.
32+
33+
With Managed Airflow, Azure Data Factory now offers multi-orchestration capabilities spanning across visual, code-centric, OSS orchestration requirements.
34+
35+
## Features
36+
37+
- **Automatic Airflow setup** – Quickly set up Apache Airflow by choosing an [Apache Airflow version](concept-managed-airflow.md#supported-apache-airflow-versions) when you create a Managed Airflow environment. ADF Managed Airflow sets up Apache Airflow for you using the same Apache Airflow user interface and open-source code you can download on the Internet.
38+
- **Automatic scaling** – Automatically scale Apache Airflow Workers by setting the minimum and maximum number of Workers that run in your environment. ADF Managed Airflow monitors the Workers in your environment. It uses its autoscaling component to add Workers to meet demand until it reaches the maximum number of Workers you defined.
39+
- **Built-in authentication** – Enable Azure Active Directory (Azure AD) role-based authentication and authorization for your Airflow Web server by defining Azure AD RBAC's access control policies.
40+
- **Built-in security** – Metadata is also automatically encrypted by Azure-managed keys, so your environment is secure by default. Additionally, it supports double encryption with a Customer-Managed Key (CMK).
41+
- **Streamlined upgrades and patches** – Azure Data Factory Managed Airflow provide new versions of Apache Airflow periodically. The ADF Managed Airflow team will auto-update and patch the minor versions.
42+
- **Workflow monitoring** – View Airflow logs and Airflow metrics in Azure Monitor to identify Airflow task delays or workflow errors without needing additional third-party tools. Managed Airflow automatically sends environment metrics, and if enabled, Airflow logs to Azure Monitor.
43+
- **Azure integration** – Azure Data Factory Managed Airflow supports open-source integrations with Azure Data Factory pipelines, Azure Batch, Azure Cosmos DB, Azure Key Vault, ACI, ADLS Gen2, Azure Kusto, as well as hundreds of built-in and community-created operators and sensors.
44+
45+
## Architecture
46+
:::image type="content" source="media/concept-managed-airflow/architecture.png" alt-text="Screenshot shows architecture in Managed Airflow.":::
47+
48+
## Region availability (public preview)
49+
50+
* EastUs
51+
* SouthCentralUs
52+
* WestUs
53+
* UKSouth
54+
* NorthEurope
55+
* WestEurope
56+
* SouthEastAsia
57+
* EastUS2
58+
* WestUS2
59+
* GermanyWestCentral
60+
* AustraliaEast
61+
62+
> [!NOTE]
63+
> By GA, all ADF regions will be supported. The Airflow environment region is defaulted to the Data Factory region and is not configurable, so ensure you use a Data Factory in the above supported region to be able to access the Managed Airflow preview.
64+
65+
## Supported Apache Airflow versions
66+
67+
* 1.10.14
68+
* 2.2.4
69+
70+
## Integrations
71+
72+
Apache Airflow integrates with Microsoft Azure services through microsoft.azure [provider](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/index.html).
73+
74+
You can install any provider package by editing the airflow environment from the Azure Data Factory UI. It takes around a couple of minutes to install the package.
75+
76+
:::image type="content" source="media/concept-managed-airflow/airflow-integration.png" lightbox="media/concept-managed-airflow/airflow-integration.png" alt-text="Screenshot shows airflow integration.":::
77+
78+
## Next steps
79+
80+
- [Run an existing pipeline with Managed Airflow](tutorial-run-existing-pipeline-with-airflow.md)
81+
- [Refresh a Power BI dataset with Managed Airflow](tutorial-refresh-power-bi-dataset-with-airflow.md)
82+
- [Managed Airflow pricing](airflow-pricing.md)
83+
- [How to change the password for Managed Airflow environments](password-change-airflow.md)
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: How does Managed Airflow work?
3+
titleSuffix: Azure Data Factory
4+
description: This article explains how to create a Managed Airflow instance and use DAG to make it work.
5+
ms.service: data-factory
6+
ms.topic: conceptual
7+
author: nabhishek
8+
ms.author: abnarain
9+
ms.date: 01/20/2023
10+
---
11+
12+
# How does Azure Data Factory Managed Airflow work?
13+
14+
[!INCLUDE[appliesto-adf-xxx-md](includes/appliesto-adf-xxx-md.md)]
15+
16+
> [!NOTE]
17+
> Managed Airflow for Azure Data Factory relies on the open source Apache Airflow application. Documentation and more tutorials for Airflow can be found on the Apache Airflow [Documentation](https://airflow.apache.org/docs/) or [Community](https://airflow.apache.org/community/) pages.
18+
19+
Azure Data Factory Managed Airflow orchestrates your workflows using Directed Acyclic Graphs (DAGs) written in Python. You must provide your DAGs and plugins in Azure Blob Storage. Airflow requirements or library dependencies can be installed during the creation of the new Managed Airflow environment or by editing an existing Managed Airflow environment. Then run and monitor your DAGs by launching the Airflow UI from ADF using a command line interface (CLI) or a software development kit (SDK).
20+
21+
## Create a Managed Airflow environment
22+
The following steps setup and configure your Managed Airflow environment.
23+
24+
### Prerequisites
25+
**Azure subscription**: If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
26+
Create or select an existing Data Factory in the region where the managed airflow preview is supported.
27+
28+
### Steps to create the environment
29+
1. Create new Managed Airflow environment.
30+
Go to **Manage** hub -> **Airflow (Preview)** -> **+New** to create a new Airflow environment
31+
32+
:::image type="content" source="media/how-does-managed-airflow-work/create-new-airflow.png" alt-text="Screenshot that shows how to create a new Managed Apache Airflow environment.":::
33+
34+
1. Provide the details (Airflow config)
35+
36+
:::image type="content" source="media/how-does-managed-airflow-work/airflow-environment-details.png" alt-text="Screenshot that shows some Managed Airflow environment details.":::
37+
38+
> [!IMPORTANT]
39+
> When using **Basic** authentication, remember the username and password specified in this screen. It will be needed to login later in the Managed Airflow UI. The default option is **AAD** and it does not require creating username/ password for your Airflow environment, but instead uses the logged in user**s credential to Azure Data Factory to login/ monitor DAGs.
40+
1. **Environment variables** a simple key value store within Airflow to store and retrieve arbitrary content or settings.
41+
1. **Requirements** can be used to pre-install python libraries. You can update these later as well.
42+
43+
## Import DAGs
44+
45+
The following steps describe how to import DAGs into Managed Airflow.
46+
47+
### Prerequisite
48+
49+
You will need to upload a sample DAG onto an accessible Storage account.
50+
51+
> [!NOTE]
52+
> Blob Storage behind VNet are not supported during the preview. We will be adding the support shortly.
53+
54+
[Sample Apache Airflow v2.x DAG](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html).
55+
[Sample Apache Airflow v1.10 DAG](https://airflow.apache.org/docs/apache-airflow/1.10.11/_modules/airflow/example_dags/tutorial.html).
56+
57+
1. Copy-paste the content (either v2.x or v1.10 based on the Airflow environment that you have setup) into a new file called as **tutorial.py**.
58+
59+
Upload the **tutorial.py** to a blob storage. ([How to upload a file into blob](/storage/blobs/storage-quickstart-blobs-portal.md))
60+
61+
> [!NOTE]
62+
> You will need to select a directory path from a blob storage account that contains folders named **dags** and **plugins** to import those into the Airflow environment. **Plugins** are not mandatory. You can also have a container named **dags** and upload all Airflow files within it.
63+
64+
1. Click on **Airflow (Preview)** under **Manage** hub. Then hover over the earlier created **Airflow** environment and click on **Import files** to Import all DAGs and dependencies into the Airflow Environment.
65+
66+
:::image type="content" source="media/how-does-managed-airflow-work/import-files.png" alt-text="Screenshot shows import files in manage hub.":::
67+
68+
1. Create a new Linked Service to the accessible storage account mentioned in the prerequisite (or use an existing one if you already have your own DAGs).
69+
70+
:::image type="content" source="media/how-does-managed-airflow-work/create-new-linked-service.png" alt-text="Screenshot that shows how to create a new linked service.":::
71+
72+
1. Use the storage account where you uploaded the DAG (check prerequisite). Test connection, then click **Create**.
73+
74+
:::image type="content" source="media/how-does-managed-airflow-work/linked-service-details.png" alt-text="Screenshot shows some linked service details.":::
75+
76+
1. Browse and select **airflow** if using the sample SAS URL or select the folder that contains **dags** folder with DAG files.
77+
78+
> [!NOTE]
79+
> You can import DAGs and their dependencies through this interface. You will need to select a directory path from a blob storage account that contains folders named **dags** and **plugins** to import those into the Airflow environment. **Plugins** are not mandatory.
80+
81+
:::image type="content" source="media/how-does-managed-airflow-work/browse-storage.png" alt-text="Screenshot shows browse storage in import files.":::
82+
83+
:::image type="content" source="media/how-does-managed-airflow-work/browse.png" alt-text="Screenshot that shows browse in airflow.":::
84+
85+
:::image type="content" source="media/how-does-managed-airflow-work/import-in-import-files.png" alt-text="Screenshot shows import in import files.":::
86+
87+
:::image type="content" source="media/how-does-managed-airflow-work/import-dags.png" alt-text="Screenshot shows import dags.":::
88+
89+
> [!NOTE]
90+
> Importing DAGs could take a couple of minutes during **Preview**. The notification center (bell icon in ADF UI) can be used to track the import status updates.
91+
92+
## Troubleshooting import DAG issues
93+
94+
* Problem: DAG import is taking over 5 minutes
95+
Mitigation: Reduce the size of the imported DAGs with a single import. One way to achieve this is by creating multiple DAG folders with lesser DAGs across multiple containers.
96+
97+
* Problem: Imported DAGs do not show up when you login into the Airflow UI.
98+
Mitigation: Login into the Airflow UI and see if there are any DAG parsing errors. This could happen if the DAG files contains any incompatible code. You will find the exact line numbers and the files which have the issue through the Airflow UI.
99+
100+
:::image type="content" source="media/how-does-managed-airflow-work/import-dag-issues.png" alt-text="Screenshot shows import dag issues.":::
101+
102+
103+
## Monitor DAG runs
104+
105+
To monitor the Airflow DAGs, login into Airflow UI with the earlier created username and password.
106+
107+
1. Click on the Airflow environment created.
108+
109+
:::image type="content" source="media/how-does-managed-airflow-work/airflow-environment-monitor-dag.png" alt-text="Screenshot that shows the Airflow environment created.":::
110+
111+
1. Login using the username-password provided during the Airflow Integration Runtime creation. ([You can reset the username or password by editing the Airflow Integration runtime]() if needed)
112+
113+
:::image type="content" source="media/how-does-managed-airflow-work/login-in-dags.png" alt-text="Screenshot that shows login using the username-password provided during the Airflow Integration Runtime creation.":::
114+
115+
## Remove DAGs from the Airflow environment
116+
117+
If you are using Airflow version 1.x, delete DAGs that are deployed on any Airflow environment (IR), you need to delete the DAGs in two different places.
118+
119+
1. Delete the DAG from Airflow UI
120+
1. Delete the DAG in ADF UI
121+
122+
> [!NOTE]
123+
> This is the current experience during the Public Preview, and we will be improving this experience. 
124+
125+
## Next steps
126+
127+
* [Run an existing pipeline with Managed Airflow](tutorial-run-existing-pipeline-with-airflow.md)
128+
* [Refresh a Power BI dataset with Managed Airflow](tutorial-refresh-power-bi-dataset-with-airflow.md)
129+
* [Managed Airflow pricing](airflow-pricing.md)
130+
* [How to change the password for Managed Airflow environments](password-change-airflow.md)

articles/data-factory/index.yml

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,22 @@ metadata:
1515
ms.custom: contperf-fy21q4
1616

1717
landingContent:
18+
- title: What's New in Azure Data Factory
19+
linkLists:
20+
- linkListType: overview
21+
links:
22+
- text: What's New in Azure Data Factory
23+
url: ./whats-new.md
24+
- text: Change data capture
25+
url: ./concepts-change-data-capture.md
26+
- text: Managed Airflow
27+
url: ./concept-managed-airflow.md
1828
- title: About Azure Data Factory
1929
linkLists:
2030
- linkListType: overview
2131
links:
2232
- text: Introduction to Azure Data Factory
2333
url: ./introduction.md
24-
- text: What's New in Azure Data Factory
25-
url: ./whats-new.md
2634
- linkListType: reference
2735
links:
2836
- text: Whitepapers
@@ -111,6 +119,22 @@ landingContent:
111119
url: ./data-transformation-functions.md
112120
- text: Change Data Capture (CDC)
113121
url: ./concepts-change-data-capture.md
122+
- title: Managed Airflow
123+
linkLists:
124+
- linkListType: concept
125+
links:
126+
- text: Managed Airflow
127+
url: concept-managed-airflow.md
128+
- linkListType: how-to-guide
129+
links:
130+
- text: How does Managed Airflow work?
131+
url: how-does-managed-airflow-work.md
132+
- linkListType: tutorial
133+
links:
134+
- text: Run an existing pipeline with Managed Airflow
135+
url: tutorial-run-existing-pipeline-with-airflow.md
136+
- text: Refresh a Power BI dataset with Managed Airflow
137+
url: tutorial-refresh-power-bi-dataset-with-airflow.md
114138
- title: SAP knowledge center
115139
linkLists:
116140
- linkListType: concept
64.4 KB
Loading
99.7 KB
Loading
119 KB
Loading
117 KB
Loading
56.9 KB
Loading

0 commit comments

Comments
 (0)