Skip to content

Commit ba7f5cc

Browse files
authored
Update Airflow 3.0.6 examples with enhanced documentation and IAM policies (#94)
Co-authored-by: John Jackson <john-jac@users.noreply.github.com>
1 parent 9f1079d commit ba7f5cc

File tree

8 files changed

+1559
-3
lines changed

8 files changed

+1559
-3
lines changed
Lines changed: 340 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,345 @@
1-
# Airflow 3.0.6 on MWAA examples
1+
# Airflow 3.0.6 Examples for Amazon MWAA
22

3-
This folder provides example dags to be used when running Apache Airflow verison 3.0.6 on Amazon MWAA.
3+
This directory contains example Apache Airflow 3.0.6 DAGs demonstrating integration with various AWS services. These examples are designed to work with Amazon Managed Workflows for Apache Airflow (MWAA) and showcase best practices for workflow orchestration.
4+
5+
## Overview
6+
7+
These DAGs demonstrate how to use Airflow 3.0.6 with AWS services, including proper IAM permissions, error handling, and resource management. Each DAG is self-contained and includes inline documentation.
48

59
## Prerequisites
610

7-
- Amazon MWAA environment (version 3.0.6)
11+
- Amazon MWAA environment (version 3.0.6 or later)
12+
- AWS CLI configured with appropriate permissions
13+
- IAM roles with required permissions (see [IAM Policy](#iam-policy))
14+
15+
## DAG Examples
16+
17+
### S3 Operations (`s3_airflow3_dag.py`)
18+
Demonstrates S3 bucket and object operations including:
19+
- Creating and deleting S3 buckets
20+
- Creating, listing, and deleting S3 objects
21+
- Using S3KeySensor to wait for objects
22+
23+
### Lambda Functions (`lambda_airflow3_dag.py`)
24+
Shows Lambda function lifecycle management:
25+
- Creating Lambda functions with inline code
26+
- Waiting for function activation
27+
- Invoking functions with payloads
28+
- Self-cleanup functionality
29+
30+
### AWS Glue Jobs (`glue_airflow3_dag.py`)
31+
Illustrates Glue ETL job execution:
32+
- Creating and uploading Glue scripts to S3
33+
- Running Glue 5.0 jobs with Python
34+
- Monitoring job completion with sensors
35+
36+
### Amazon Athena (`athena_airflow3_dag.py`)
37+
Demonstrates Athena query execution:
38+
- Running SQL queries against data lakes
39+
- Managing query results and outputs
40+
- Integration with CloudFormation for resource creation
41+
42+
### AWS Batch (`batch_airflow3_dag.py`)
43+
Shows containerized job execution:
44+
- Creating Batch compute environments and job queues
45+
- Submitting and monitoring Batch jobs
46+
- Using Fargate for serverless compute
47+
48+
### EMR Serverless (`emr_serverless_airflow3_dag.py`)
49+
Illustrates big data processing:
50+
- Creating EMR Serverless applications
51+
- Submitting Spark jobs
52+
- Managing application lifecycle
53+
54+
### SageMaker Processing (`sagemaker_processing_airflow3_dag.py`)
55+
Demonstrates ML data processing:
56+
- Creating SageMaker processing jobs
57+
- Using scikit-learn containers
58+
- Processing data with automatic scaling
59+
60+
## Setup
61+
62+
1. **Upload DAG files to your MWAA S3 bucket:**
63+
```bash
64+
aws s3 cp . s3://amzn-s3-demo-bucket/dags/ --recursive --exclude "*.md" --exclude "*.json"
65+
```
66+
67+
2. **Create the required IAM execution role:**
68+
```bash
69+
aws iam create-role \
70+
--role-name mwaa-airflow3-execution-role \
71+
--assume-role-policy-document file://trust-policy.json
72+
73+
aws iam put-role-policy \
74+
--role-name mwaa-airflow3-execution-role \
75+
--policy-name mwaa-airflow3-policy \
76+
--policy-document file://mwaa-comprehensive-policy.json
77+
```
78+
79+
3. **Update your MWAA environment configuration** to use the new execution role.
80+
81+
## IAM Policy
82+
83+
The comprehensive IAM policy required for all DAGs includes permissions for:
84+
85+
- S3 operations (bucket and object management)
86+
- Lambda function lifecycle management
87+
- Glue job creation and execution
88+
- Athena query execution
89+
- Batch job submission and monitoring
90+
- EMR Serverless application management
91+
- SageMaker processing jobs
92+
- CloudFormation stack operations
93+
- CloudWatch logging
94+
95+
```json
96+
{
97+
"Version": "2012-10-17",
98+
"Statement": [
99+
{
100+
"Sid": "S3Permissions",
101+
"Effect": "Allow",
102+
"Action": [
103+
"s3:CreateBucket",
104+
"s3:DeleteBucket",
105+
"s3:ListBucket",
106+
"s3:GetObject",
107+
"s3:PutObject",
108+
"s3:DeleteObject"
109+
],
110+
"Resource": [
111+
"arn:aws:s3:::*",
112+
"arn:aws:s3:::*/*"
113+
]
114+
},
115+
{
116+
"Sid": "LambdaPermissions",
117+
"Effect": "Allow",
118+
"Action": [
119+
"lambda:CreateFunction",
120+
"lambda:DeleteFunction",
121+
"lambda:GetFunction",
122+
"lambda:InvokeFunction"
123+
],
124+
"Resource": "*"
125+
},
126+
{
127+
"Sid": "GluePermissions",
128+
"Effect": "Allow",
129+
"Action": [
130+
"glue:CreateJob",
131+
"glue:GetJob",
132+
"glue:StartJobRun",
133+
"glue:GetJobRun",
134+
"glue:GetTable",
135+
"glue:CreateTable",
136+
"glue:DeleteTable",
137+
"glue:CreateDatabase",
138+
"glue:DeleteDatabase"
139+
],
140+
"Resource": "*"
141+
},
142+
{
143+
"Sid": "AthenaPermissions",
144+
"Effect": "Allow",
145+
"Action": [
146+
"athena:StartQueryExecution",
147+
"athena:GetQueryExecution",
148+
"athena:GetQueryResults"
149+
],
150+
"Resource": "*"
151+
},
152+
{
153+
"Sid": "BatchPermissions",
154+
"Effect": "Allow",
155+
"Action": [
156+
"batch:SubmitJob",
157+
"batch:DescribeJobs",
158+
"batch:CreateJobQueue",
159+
"batch:DeleteJobQueue",
160+
"batch:UpdateJobQueue",
161+
"batch:RegisterJobDefinition",
162+
"batch:DeregisterJobDefinition",
163+
"batch:CreateComputeEnvironment",
164+
"batch:UpdateComputeEnvironment",
165+
"batch:DeleteComputeEnvironment",
166+
"batch:DescribeJobQueues",
167+
"batch:DescribeJobDefinitions",
168+
"batch:DescribeComputeEnvironments",
169+
"batch:TagResource"
170+
],
171+
"Resource": "*"
172+
},
173+
{
174+
"Sid": "CloudwatchPermissions",
175+
"Effect": "Allow",
176+
"Action": [
177+
"logs:CreateLogGroup",
178+
"logs:CreateLogStream",
179+
"logs:PutLogEvents"
180+
],
181+
"Resource": "*"
182+
},
183+
{
184+
"Sid": "EMRServerlessPermissions",
185+
"Effect": "Allow",
186+
"Action": [
187+
"emr-serverless:StartJobRun",
188+
"emr-serverless:GetJobRun",
189+
"emr-serverless:CreateApplication",
190+
"emr-serverless:GetApplication",
191+
"emr-serverless:StartApplication",
192+
"emr-serverless:DeleteApplication",
193+
"emr-serverless:ListJobRuns",
194+
"emr-serverless:StopApplication"
195+
],
196+
"Resource": "*"
197+
},
198+
{
199+
"Sid": "SageMakerPermissions",
200+
"Effect": "Allow",
201+
"Action": [
202+
"sagemaker:CreateProcessingJob",
203+
"sagemaker:DescribeProcessingJob"
204+
],
205+
"Resource": "*"
206+
},
207+
{
208+
"Sid": "CloudFormationPermissions",
209+
"Effect": "Allow",
210+
"Action": [
211+
"cloudformation:CreateStack",
212+
"cloudformation:DeleteStack",
213+
"cloudformation:DescribeStacks"
214+
],
215+
"Resource": "*"
216+
},
217+
{
218+
"Sid": "ECRPermissions",
219+
"Effect": "Allow",
220+
"Action": [
221+
"ecr:GetAuthorizationToken",
222+
"ecr:BatchCheckLayerAvailability",
223+
"ecr:GetDownloadUrlForLayer",
224+
"ecr:BatchGetImage"
225+
],
226+
"Resource": "*"
227+
},
228+
{
229+
"Sid": "ECSPermissions",
230+
"Effect": "Allow",
231+
"Action": [
232+
"ecs:DescribeTaskDefinition"
233+
],
234+
"Resource": "*"
235+
},
236+
{
237+
"Sid": "CloudWatchLogsPermissions",
238+
"Effect": "Allow",
239+
"Action": [
240+
"logs:CreateLogGroup",
241+
"logs:CreateLogStream",
242+
"logs:PutLogEvents"
243+
],
244+
"Resource": "*"
245+
},
246+
{
247+
"Sid": "IAMPassRolePermissions",
248+
"Effect": "Allow",
249+
"Action": [
250+
"iam:PassRole",
251+
"iam:GetRole"
252+
],
253+
"Resource": "*"
254+
}
255+
]
256+
}
257+
```
258+
259+
### Trust Relationships
260+
261+
Your execution role needs trust relationships for:
262+
263+
264+
**Service Roles (for PassRole operations):**
265+
```json
266+
{
267+
"Version": "2012-10-17",
268+
"Statement": [
269+
{
270+
"Effect": "Allow",
271+
"Principal": {
272+
"Service": [
273+
"airflow.amazonaws.com",
274+
"airflow-env.amazonaws.com",
275+
"lambda.amazonaws.com",
276+
"glue.amazonaws.com",
277+
"sagemaker.amazonaws.com",
278+
"batch.amazonaws.com",
279+
"emr-serverless.amazonaws.com",
280+
"ecs-tasks.amazonaws.com"
281+
]
282+
},
283+
"Action": "sts:AssumeRole"
284+
}
285+
]
286+
}
287+
```
288+
289+
## Usage
290+
291+
1. **Enable DAGs** in the Airflow UI
292+
2. **Configure parameters** as needed for your environment
293+
3. **Trigger DAGs** manually or via schedule
294+
4. **Monitor execution** through CloudWatch logs
295+
296+
## Configuration
297+
298+
Most DAGs use parameters that can be customized:
299+
300+
- `role_arn`: IAM role for service operations
301+
- `s3_bucket`: S3 bucket for data and scripts
302+
- `region`: AWS region for resources
303+
304+
Update these parameters in the Airflow UI or via environment variables.
305+
306+
## Troubleshooting
307+
308+
### Common Issues
309+
310+
- **IAM Permission Errors**: Ensure your execution role has all required permissions
311+
- **Resource Not Found**: Verify S3 buckets and objects exist before running DAGs
312+
- **Timeout Issues**: Adjust timeout values for long-running jobs
313+
314+
### Logs
315+
316+
Check CloudWatch logs for detailed error information:
317+
- Airflow task logs: `/aws/amazonmwaa/[environment-name]/task`
318+
- Service-specific logs: `/aws/glue/`, `/aws/batch/`, etc.
319+
320+
## Clean Up
321+
322+
Each DAG includes cleanup tasks where appropriate. For manual cleanup:
323+
324+
```bash
325+
# Remove uploaded files
326+
aws s3 rm s3://amzn-s3-demo-bucket/dags/ --recursive
327+
328+
# Delete IAM role and policies
329+
aws iam delete-role-policy --role-name mwaa-airflow3-execution-role --policy-name mwaa-airflow3-policy
330+
aws iam delete-role --role-name mwaa-airflow3-execution-role
331+
```
332+
333+
## Security
334+
335+
- Always use least-privilege IAM permissions
336+
- Sensitive data should be stored in AWS Secrets Manager
337+
- Network access is controlled through VPC configuration
338+
339+
## Contributing
340+
341+
See [CONTRIBUTING](../../../CONTRIBUTING.md) for guidelines on contributing to this repository.
342+
343+
## License
8344

345+
This library is licensed under the MIT-0 License. See the [LICENSE](../../../LICENSE) file.

0 commit comments

Comments
 (0)