Skip to content

Commit 5567efa

Browse files
author
Kamen Sharlandjiev
committed
cleaning up and adding README
1 parent 0704b10 commit 5567efa

File tree

6 files changed

+464
-307
lines changed

6 files changed

+464
-307
lines changed

usecases/mwaa-dag-factory-example/README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MWAA Data Pipeline Workshop
22

3-
One-command deployment of a complete MWAA data pipeline with VPC, Redshift, Glue, and EMR Serverless.
3+
One-command deployment of a provisioned MWAA instance with DAG Factory installed. As part of this stack data pipeline with VPC, Redshift, Glue, and EMR Serverless.
44

55
## Quick Start
66

@@ -19,13 +19,50 @@ python3 cleanup.py
1919

2020
## What Gets Deployed
2121

22+
### Provisioned MWAA Instance
23+
- **MWAA 3.0.6** environment running Apache Airflow
24+
- **DAG Factory** pre-installed for YAML-based DAG definitions
25+
- Small environment size (suitable for development/testing)
26+
27+
### Data Pipeline Examples
28+
1. **Python DAG** (`dags/data_pipeline.py`) - Traditional Airflow DAG written in Python
29+
2. **YAML DAG** (`yaml/example_data_pipeline.yaml`) - DAG Factory representation of the same pipeline
30+
31+
### Infrastructure Components
2232
- VPC with public/private subnets, NAT gateways
23-
- MWAA 2.10.3 small environment
2433
- Redshift Serverless (8 RPU)
2534
- S3 bucket with DAGs, scripts, sample data
2635
- IAM roles for Glue, EMR, Redshift
27-
- Python data pipeline DAG example
28-
- YAML data pipeline DAG example
36+
37+
## Deploying to MWAA Serverless
38+
39+
The YAML-based DAG can be deployed to MWAA Serverless using the following commands:
40+
41+
### 1. Convert Python DAG to YAML
42+
```bash
43+
dag-converter convert data_pipeline.py --output yaml/
44+
```
45+
46+
### 2. Upload YAML to S3
47+
```bash
48+
aws s3 sync yaml/ s3://YOUR-BUCKET-NAME/yaml/
49+
```
50+
51+
### 3. Create Serverless Workflow
52+
```bash
53+
aws mwaa-serverless create-workflow \
54+
--name example_data_pipeline \
55+
--definition-s3-location '{ "Bucket": "YOUR-BUCKET-NAME", "ObjectKey": "yaml/example_data_pipeline.yaml" }' \
56+
--role-arn arn:aws:iam::YOUR-ACCOUNT-ID:role/service-role/YOUR-MWAA-EXECUTION-ROLE \
57+
--region us-east-2
58+
```
59+
60+
### 4. List Serverless Workflows
61+
```bash
62+
aws mwaa-serverless list-workflows --region us-east-2
63+
```
64+
65+
> **Note:** Replace `YOUR-BUCKET-NAME`, `YOUR-ACCOUNT-ID`, and `YOUR-MWAA-EXECUTION-ROLE` with your actual AWS resource identifiers.
2966
3067
## Pipeline Flow
3168

usecases/mwaa-dag-factory-example/dags/dag_converter_pipeline.py

Lines changed: 0 additions & 162 deletions
This file was deleted.

usecases/mwaa-dag-factory-example/dags/data_pipeline.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161
script_args={
6262
'--dag_name': 'immersion_day_data_pipeline',
6363
'--task_id': 'glue_job',
64-
'--correlation_id': '{{ run_id }}'
64+
'--correlation_id': 'start_emr_serverless_job'
6565
}
6666
)
6767

@@ -86,7 +86,7 @@
8686
"s3://{{S3_BUCKET_NAME}}/data/aggregated/green",
8787
"immersion_day_data_pipeline",
8888
"start_emr_serverless_job",
89-
"{{ run_id }}"
89+
"start_emr_serverless_job"
9090
],
9191
"sparkSubmitParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=4G --conf spark.executor.cores=2 --conf spark.executor.memoryOverhead=1G"
9292
}

usecases/mwaa-dag-factory-example/dags/data_pipeline.yaml

Lines changed: 0 additions & 139 deletions
This file was deleted.

0 commit comments

Comments
 (0)