|
1 | | -# Airflow 3.0.6 on MWAA examples |
| 1 | +# Airflow 3.0.6 Examples for Amazon MWAA |
2 | 2 |
|
3 | | -This folder provides example dags to be used when running Apache Airflow verison 3.0.6 on Amazon MWAA. |
| 3 | +This directory contains example Apache Airflow 3.0.6 DAGs demonstrating integration with various AWS services. These examples are designed to work with Amazon Managed Workflows for Apache Airflow (MWAA) and showcase best practices for workflow orchestration. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +These DAGs demonstrate how to use Airflow 3.0.6 with AWS services, including proper IAM permissions, error handling, and resource management. Each DAG is self-contained and includes inline documentation. |
4 | 8 |
|
5 | 9 | ## Prerequisites |
6 | 10 |
|
7 | | -- Amazon MWAA environment (version 3.0.6) |
| 11 | +- Amazon MWAA environment (version 3.0.6 or later) |
| 12 | +- AWS CLI configured with appropriate permissions |
| 13 | +- IAM roles with required permissions (see [IAM Policy](#iam-policy)) |
| 14 | + |
| 15 | +## DAG Examples |
| 16 | + |
| 17 | +### S3 Operations (`s3_airflow3_dag.py`) |
| 18 | +Demonstrates S3 bucket and object operations including: |
| 19 | +- Creating and deleting S3 buckets |
| 20 | +- Creating, listing, and deleting S3 objects |
| 21 | +- Using S3KeySensor to wait for objects |
| 22 | + |
| 23 | +### Lambda Functions (`lambda_airflow3_dag.py`) |
| 24 | +Shows Lambda function lifecycle management: |
| 25 | +- Creating Lambda functions with inline code |
| 26 | +- Waiting for function activation |
| 27 | +- Invoking functions with payloads |
| 28 | +- Self-cleanup functionality |
| 29 | + |
| 30 | +### AWS Glue Jobs (`glue_airflow3_dag.py`) |
| 31 | +Illustrates Glue ETL job execution: |
| 32 | +- Creating and uploading Glue scripts to S3 |
| 33 | +- Running Glue 5.0 jobs with Python |
| 34 | +- Monitoring job completion with sensors |
| 35 | + |
| 36 | +### Amazon Athena (`athena_airflow3_dag.py`) |
| 37 | +Demonstrates Athena query execution: |
| 38 | +- Running SQL queries against data lakes |
| 39 | +- Managing query results and outputs |
| 40 | +- Integration with CloudFormation for resource creation |
| 41 | + |
| 42 | +### AWS Batch (`batch_airflow3_dag.py`) |
| 43 | +Shows containerized job execution: |
| 44 | +- Creating Batch compute environments and job queues |
| 45 | +- Submitting and monitoring Batch jobs |
| 46 | +- Using Fargate for serverless compute |
| 47 | + |
| 48 | +### EMR Serverless (`emr_serverless_airflow3_dag.py`) |
| 49 | +Illustrates big data processing: |
| 50 | +- Creating EMR Serverless applications |
| 51 | +- Submitting Spark jobs |
| 52 | +- Managing application lifecycle |
| 53 | + |
| 54 | +### SageMaker Processing (`sagemaker_processing_airflow3_dag.py`) |
| 55 | +Demonstrates ML data processing: |
| 56 | +- Creating SageMaker processing jobs |
| 57 | +- Using scikit-learn containers |
| 58 | +- Processing data with automatic scaling |
| 59 | + |
| 60 | +## Setup |
| 61 | + |
| 62 | +1. **Upload DAG files to your MWAA S3 bucket:** |
| 63 | + ```bash |
| 64 | + aws s3 cp . s3://amzn-s3-demo-bucket/dags/ --recursive --exclude "*.md" --exclude "*.json" |
| 65 | + ``` |
| 66 | + |
| 67 | +2. **Create the required IAM execution role:** |
| 68 | + ```bash |
| 69 | + aws iam create-role \ |
| 70 | + --role-name mwaa-airflow3-execution-role \ |
| 71 | + --assume-role-policy-document file://trust-policy.json |
| 72 | + |
| 73 | + aws iam put-role-policy \ |
| 74 | + --role-name mwaa-airflow3-execution-role \ |
| 75 | + --policy-name mwaa-airflow3-policy \ |
| 76 | + --policy-document file://mwaa-comprehensive-policy.json |
| 77 | + ``` |
| 78 | + |
| 79 | +3. **Update your MWAA environment configuration** to use the new execution role. |
| 80 | + |
| 81 | +## IAM Policy |
| 82 | + |
| 83 | +The comprehensive IAM policy required for all DAGs includes permissions for: |
| 84 | + |
| 85 | +- S3 operations (bucket and object management) |
| 86 | +- Lambda function lifecycle management |
| 87 | +- Glue job creation and execution |
| 88 | +- Athena query execution |
| 89 | +- Batch job submission and monitoring |
| 90 | +- EMR Serverless application management |
| 91 | +- SageMaker processing jobs |
| 92 | +- CloudFormation stack operations |
| 93 | +- CloudWatch logging |
| 94 | + |
| 95 | +```json |
| 96 | +{ |
| 97 | + "Version": "2012-10-17", |
| 98 | + "Statement": [ |
| 99 | + { |
| 100 | + "Sid": "S3Permissions", |
| 101 | + "Effect": "Allow", |
| 102 | + "Action": [ |
| 103 | + "s3:CreateBucket", |
| 104 | + "s3:DeleteBucket", |
| 105 | + "s3:ListBucket", |
| 106 | + "s3:GetObject", |
| 107 | + "s3:PutObject", |
| 108 | + "s3:DeleteObject" |
| 109 | + ], |
| 110 | + "Resource": [ |
| 111 | + "arn:aws:s3:::*", |
| 112 | + "arn:aws:s3:::*/*" |
| 113 | + ] |
| 114 | + }, |
| 115 | + { |
| 116 | + "Sid": "LambdaPermissions", |
| 117 | + "Effect": "Allow", |
| 118 | + "Action": [ |
| 119 | + "lambda:CreateFunction", |
| 120 | + "lambda:DeleteFunction", |
| 121 | + "lambda:GetFunction", |
| 122 | + "lambda:InvokeFunction" |
| 123 | + ], |
| 124 | + "Resource": "*" |
| 125 | + }, |
| 126 | + { |
| 127 | + "Sid": "GluePermissions", |
| 128 | + "Effect": "Allow", |
| 129 | + "Action": [ |
| 130 | + "glue:CreateJob", |
| 131 | + "glue:GetJob", |
| 132 | + "glue:StartJobRun", |
| 133 | + "glue:GetJobRun", |
| 134 | + "glue:GetTable", |
| 135 | + "glue:CreateTable", |
| 136 | + "glue:DeleteTable", |
| 137 | + "glue:CreateDatabase", |
| 138 | + "glue:DeleteDatabase" |
| 139 | + ], |
| 140 | + "Resource": "*" |
| 141 | + }, |
| 142 | + { |
| 143 | + "Sid": "AthenaPermissions", |
| 144 | + "Effect": "Allow", |
| 145 | + "Action": [ |
| 146 | + "athena:StartQueryExecution", |
| 147 | + "athena:GetQueryExecution", |
| 148 | + "athena:GetQueryResults" |
| 149 | + ], |
| 150 | + "Resource": "*" |
| 151 | + }, |
| 152 | + { |
| 153 | + "Sid": "BatchPermissions", |
| 154 | + "Effect": "Allow", |
| 155 | + "Action": [ |
| 156 | + "batch:SubmitJob", |
| 157 | + "batch:DescribeJobs", |
| 158 | + "batch:CreateJobQueue", |
| 159 | + "batch:DeleteJobQueue", |
| 160 | + "batch:UpdateJobQueue", |
| 161 | + "batch:RegisterJobDefinition", |
| 162 | + "batch:DeregisterJobDefinition", |
| 163 | + "batch:CreateComputeEnvironment", |
| 164 | + "batch:UpdateComputeEnvironment", |
| 165 | + "batch:DeleteComputeEnvironment", |
| 166 | + "batch:DescribeJobQueues", |
| 167 | + "batch:DescribeJobDefinitions", |
| 168 | + "batch:DescribeComputeEnvironments", |
| 169 | + "batch:TagResource" |
| 170 | + ], |
| 171 | + "Resource": "*" |
| 172 | + }, |
| 173 | + { |
| 174 | + "Sid": "CloudwatchPermissions", |
| 175 | + "Effect": "Allow", |
| 176 | + "Action": [ |
| 177 | + "logs:CreateLogGroup", |
| 178 | + "logs:CreateLogStream", |
| 179 | + "logs:PutLogEvents" |
| 180 | + ], |
| 181 | + "Resource": "*" |
| 182 | + }, |
| 183 | + { |
| 184 | + "Sid": "EMRServerlessPermissions", |
| 185 | + "Effect": "Allow", |
| 186 | + "Action": [ |
| 187 | + "emr-serverless:StartJobRun", |
| 188 | + "emr-serverless:GetJobRun", |
| 189 | + "emr-serverless:CreateApplication", |
| 190 | + "emr-serverless:GetApplication", |
| 191 | + "emr-serverless:StartApplication", |
| 192 | + "emr-serverless:DeleteApplication", |
| 193 | + "emr-serverless:ListJobRuns", |
| 194 | + "emr-serverless:StopApplication" |
| 195 | + ], |
| 196 | + "Resource": "*" |
| 197 | + }, |
| 198 | + { |
| 199 | + "Sid": "SageMakerPermissions", |
| 200 | + "Effect": "Allow", |
| 201 | + "Action": [ |
| 202 | + "sagemaker:CreateProcessingJob", |
| 203 | + "sagemaker:DescribeProcessingJob" |
| 204 | + ], |
| 205 | + "Resource": "*" |
| 206 | + }, |
| 207 | + { |
| 208 | + "Sid": "CloudFormationPermissions", |
| 209 | + "Effect": "Allow", |
| 210 | + "Action": [ |
| 211 | + "cloudformation:CreateStack", |
| 212 | + "cloudformation:DeleteStack", |
| 213 | + "cloudformation:DescribeStacks" |
| 214 | + ], |
| 215 | + "Resource": "*" |
| 216 | + }, |
| 217 | + { |
| 218 | + "Sid": "ECRPermissions", |
| 219 | + "Effect": "Allow", |
| 220 | + "Action": [ |
| 221 | + "ecr:GetAuthorizationToken", |
| 222 | + "ecr:BatchCheckLayerAvailability", |
| 223 | + "ecr:GetDownloadUrlForLayer", |
| 224 | + "ecr:BatchGetImage" |
| 225 | + ], |
| 226 | + "Resource": "*" |
| 227 | + }, |
| 228 | + { |
| 229 | + "Sid": "ECSPermissions", |
| 230 | + "Effect": "Allow", |
| 231 | + "Action": [ |
| 232 | + "ecs:DescribeTaskDefinition" |
| 233 | + ], |
| 234 | + "Resource": "*" |
| 235 | + }, |
| 236 | + { |
| 237 | + "Sid": "CloudWatchLogsPermissions", |
| 238 | + "Effect": "Allow", |
| 239 | + "Action": [ |
| 240 | + "logs:CreateLogGroup", |
| 241 | + "logs:CreateLogStream", |
| 242 | + "logs:PutLogEvents" |
| 243 | + ], |
| 244 | + "Resource": "*" |
| 245 | + }, |
| 246 | + { |
| 247 | + "Sid": "IAMPassRolePermissions", |
| 248 | + "Effect": "Allow", |
| 249 | + "Action": [ |
| 250 | + "iam:PassRole", |
| 251 | + "iam:GetRole" |
| 252 | + ], |
| 253 | + "Resource": "*" |
| 254 | + } |
| 255 | + ] |
| 256 | +} |
| 257 | +``` |
| 258 | + |
| 259 | +### Trust Relationships |
| 260 | + |
| 261 | +Your execution role needs trust relationships for: |
| 262 | + |
| 263 | + |
| 264 | +**Service Roles (for PassRole operations):** |
| 265 | +```json |
| 266 | +{ |
| 267 | + "Version": "2012-10-17", |
| 268 | + "Statement": [ |
| 269 | + { |
| 270 | + "Effect": "Allow", |
| 271 | + "Principal": { |
| 272 | + "Service": [ |
| 273 | + "airflow.amazonaws.com", |
| 274 | + "airflow-env.amazonaws.com", |
| 275 | + "lambda.amazonaws.com", |
| 276 | + "glue.amazonaws.com", |
| 277 | + "sagemaker.amazonaws.com", |
| 278 | + "batch.amazonaws.com", |
| 279 | + "emr-serverless.amazonaws.com", |
| 280 | + "ecs-tasks.amazonaws.com" |
| 281 | + ] |
| 282 | + }, |
| 283 | + "Action": "sts:AssumeRole" |
| 284 | + } |
| 285 | + ] |
| 286 | +} |
| 287 | +``` |
| 288 | + |
| 289 | +## Usage |
| 290 | + |
| 291 | +1. **Enable DAGs** in the Airflow UI |
| 292 | +2. **Configure parameters** as needed for your environment |
| 293 | +3. **Trigger DAGs** manually or via schedule |
| 294 | +4. **Monitor execution** through CloudWatch logs |
| 295 | + |
| 296 | +## Configuration |
| 297 | + |
| 298 | +Most DAGs use parameters that can be customized: |
| 299 | + |
| 300 | +- `role_arn`: IAM role for service operations |
| 301 | +- `s3_bucket`: S3 bucket for data and scripts |
| 302 | +- `region`: AWS region for resources |
| 303 | + |
| 304 | +Update these parameters in the Airflow UI or via environment variables. |
| 305 | + |
| 306 | +## Troubleshooting |
| 307 | + |
| 308 | +### Common Issues |
| 309 | + |
| 310 | +- **IAM Permission Errors**: Ensure your execution role has all required permissions |
| 311 | +- **Resource Not Found**: Verify S3 buckets and objects exist before running DAGs |
| 312 | +- **Timeout Issues**: Adjust timeout values for long-running jobs |
| 313 | + |
| 314 | +### Logs |
| 315 | + |
| 316 | +Check CloudWatch logs for detailed error information: |
| 317 | +- Airflow task logs: `/aws/amazonmwaa/[environment-name]/task` |
| 318 | +- Service-specific logs: `/aws/glue/`, `/aws/batch/`, etc. |
| 319 | + |
| 320 | +## Clean Up |
| 321 | + |
| 322 | +Each DAG includes cleanup tasks where appropriate. For manual cleanup: |
| 323 | + |
| 324 | +```bash |
| 325 | +# Remove uploaded files |
| 326 | +aws s3 rm s3://amzn-s3-demo-bucket/dags/ --recursive |
| 327 | + |
| 328 | +# Delete IAM role and policies |
| 329 | +aws iam delete-role-policy --role-name mwaa-airflow3-execution-role --policy-name mwaa-airflow3-policy |
| 330 | +aws iam delete-role --role-name mwaa-airflow3-execution-role |
| 331 | +``` |
| 332 | + |
| 333 | +## Security |
| 334 | + |
| 335 | +- Always use least-privilege IAM permissions |
| 336 | +- Sensitive data should be stored in AWS Secrets Manager |
| 337 | +- Network access is controlled through VPC configuration |
| 338 | + |
| 339 | +## Contributing |
| 340 | + |
| 341 | +See [CONTRIBUTING](../../../CONTRIBUTING.md) for guidelines on contributing to this repository. |
| 342 | + |
| 343 | +## License |
8 | 344 |
|
| 345 | +This library is licensed under the MIT-0 License. See the [LICENSE](../../../LICENSE) file. |
0 commit comments