High-throughput event-driven data pipeline using Apache Kafka (MSK), AWS Lambda, Kubernetes consumers, Apache Airflow for batch ETL, and Snowflake as the analytics warehouse.
| Component | Tool | Purpose |
|---|---|---|
| Message Broker | Apache Kafka (MSK) | Event streaming backbone |
| Producers | Python | Generate 500+ msg/sec event stream |
| Consumers | Kubernetes | Auto-scaling consumer group |
| Serverless | AWS Lambda + EventBridge | Event transformation |
| Batch ETL | Apache Airflow | Daily aggregation to Snowflake |
| Data Warehouse | Snowflake | Analytics and reporting |
| IaC | Terraform | MSK cluster + Lambda provisioning |
| Monitoring | Prometheus + Grafana | Consumer lag and throughput dashboards |
# Provision Kafka cluster
cd terraform/modules/msk && terraform init && terraform apply
# Start event producer
python producers/src/producer.py --rate 500
# Deploy auto-scaling consumers
kubectl apply -f k8s/consumers/
# Trigger batch ETL
airflow dags trigger daily_events_to_snowflake| Metric | Result |
|---|---|
| Throughput | 500K+ messages/hour sustained |
| Consumer lag | < 500ms at steady state |
| End-to-end latency | < 3 seconds |
| Airflow DAG success | > 99% reliability |
├── airflow/dags/ # Batch ETL DAG definitions
├── consumers/src/ # Kafka consumer application
├── docker/ # Dockerfiles (producer + consumer)
├── k8s/consumers/ # Consumer deployment manifests
├── monitoring/prometheus/ # Monitoring configuration
├── producers/src/ # Event producer application
└── terraform/modules/ # MSK + Lambda IaC
This project is for portfolio/demonstration purposes.
