This project demonstrates a complete end-to-end ETL pipeline that extracts real estate data from Zillow’s RapidAPI, transforms and loads it into an Amazon Redshift data warehouse, and finally visualizes it using Amazon QuickSight.
- 🔄 ETL Orchestration with Apache Airflow
- ☁️ AWS Services used: EC2, S3, Lambda, Redshift, IAM, QuickSight
- 📡 Data Source: Zillow (via RapidAPI)
- 🤖 Automated using Airflow DAGs and AWS Lambda functions
- 📊 Interactive dashboard built on Amazon QuickSight
1. Extract:
- Python script pulls real estate data from Zillow RapidAPI.
- Data is stored as JSON and uploaded to S3 (Landing Zone).
2. Transform:
- AWS Lambda copies JSON to a new bucket and converts it to CSV.
- CSV is stored in a processed S3 bucket.
3. Load:
- CSV data is loaded into Amazon Redshift using COPY commands.
- Airflow DAG monitors each step using S3 sensors and triggers.
4. Visualize:
- Redshift tables are connected to Amazon QuickSight for dashboarding.
| Component | Technology |
|---|---|
| API Source | Zillow via RapidAPI |
| Workflow Engine | Apache Airflow |
| Compute | AWS EC2 |
| Storage | Amazon S3 |
| Processing | AWS Lambda |
| Data Warehouse | Amazon Redshift |
| Visualization | Amazon QuickSight |
| Language | Python |
- Python script fetches real estate listings from Zillow's RapidAPI
- JSON data is saved locally and uploaded to an S3 bucket
- AWS Lambda function is triggered when a new JSON is uploaded
- Transforms the JSON into a CSV file and stores it in another S3 bucket
- Apache Airflow DAG waits for the transformed CSV in S3
- Data is loaded into Redshift using the COPY command
- Data in Redshift is connected to Amazon QuickSight
- Dashboards are created for business insights
- Sign up on RapidAPI
- Subscribe to the Zillow API and get your API key
- Launch an EC2 instance (Ubuntu recommended)
- Install Python, Apache Airflow, AWS CLI, and boto3
- Clone this repository and configure the DAGs
Create the following S3 buckets:
zillow-raw-datazillow-staged-datazillow-transformed-data
Deploy two Lambda functions:
copy_raw_to_stage: Moves raw JSON to a staging buckettransform_json_to_csv: Converts JSON to CSV and stores it in the final bucket
- Set up a Redshift cluster
- Create necessary tables using SQL schema provided
- Ensure IAM roles and bucket permissions are configured for COPY
- Connect Redshift as a data source
- Build visualizations for metrics like:
- Top cities by listings
- Average property prices
- Monthly trends in real estate listings
.
├── dags/
│ └── zillow_etl_dag.py
├── lambda/
│ ├── copy_raw_to_stage.py
│ └── transform_json_to_csv.py
├── scripts/
│ └── extract_zillow_data.py
├── redshift/
│ └── create_tables.sql
├── dashboard/
│ └── quicksight_screenshots/
├── requirements.txt
Insert dashboard screenshots here to show data insights and visualizations.
- How to orchestrate an ETL workflow using Apache Airflow
- Leveraging AWS Lambda for serverless data processing
- Automating data pipelines using S3 event triggers and Airflow DAGs
- Building insightful dashboards with Amazon QuickSight
- Integrating multiple AWS services in a production-grade pipeline
- AWS account with necessary permissions (IAM roles, S3, Lambda, Redshift, QuickSight)
- RapidAPI account
- Python 3.8+
- Airflow 2.x
📧 Email: nandinirathod909@gmail.com
🔗 LinkedIn: Nandini Rathod