Skip to content

Nandini2233/Zillow_ETL_Pipeline_Project

Repository files navigation

🏠 Zillow Real Estate Data Analytics ETL Pipeline

This project demonstrates a complete end-to-end ETL pipeline that extracts real estate data from Zillow’s RapidAPI, transforms and loads it into an Amazon Redshift data warehouse, and finally visualizes it using Amazon QuickSight.


📌 Project Highlights

  • 🔄 ETL Orchestration with Apache Airflow
  • ☁️ AWS Services used: EC2, S3, Lambda, Redshift, IAM, QuickSight
  • 📡 Data Source: Zillow (via RapidAPI)
  • 🤖 Automated using Airflow DAGs and AWS Lambda functions
  • 📊 Interactive dashboard built on Amazon QuickSight

📂 Project Architecture

1. Extract:
   - Python script pulls real estate data from Zillow RapidAPI.
   - Data is stored as JSON and uploaded to S3 (Landing Zone).

2. Transform:
   - AWS Lambda copies JSON to a new bucket and converts it to CSV.
   - CSV is stored in a processed S3 bucket.

3. Load:
   - CSV data is loaded into Amazon Redshift using COPY commands.
   - Airflow DAG monitors each step using S3 sensors and triggers.

4. Visualize:
   - Redshift tables are connected to Amazon QuickSight for dashboarding.

🛠️ Tech Stack

Component Technology
API Source Zillow via RapidAPI
Workflow Engine Apache Airflow
Compute AWS EC2
Storage Amazon S3
Processing AWS Lambda
Data Warehouse Amazon Redshift
Visualization Amazon QuickSight
Language Python

🚀 End-to-End Workflow

Data Extraction

  • Python script fetches real estate listings from Zillow's RapidAPI
  • JSON data is saved locally and uploaded to an S3 bucket

Data Transformation

  • AWS Lambda function is triggered when a new JSON is uploaded
  • Transforms the JSON into a CSV file and stores it in another S3 bucket

Data Loading

  • Apache Airflow DAG waits for the transformed CSV in S3
  • Data is loaded into Redshift using the COPY command

Visualization

  • Data in Redshift is connected to Amazon QuickSight
  • Dashboards are created for business insights

🧪 Setup Instructions

🔐 1. API Key (Zillow)

  1. Sign up on RapidAPI
  2. Subscribe to the Zillow API and get your API key

🖥️ 2. EC2 Environment

  1. Launch an EC2 instance (Ubuntu recommended)
  2. Install Python, Apache Airflow, AWS CLI, and boto3
  3. Clone this repository and configure the DAGs

☁️ 3. S3 Buckets

Create the following S3 buckets:

  • zillow-raw-data
  • zillow-staged-data
  • zillow-transformed-data

🔄 4. AWS Lambda

Deploy two Lambda functions:

  1. copy_raw_to_stage: Moves raw JSON to a staging bucket
  2. transform_json_to_csv: Converts JSON to CSV and stores it in the final bucket

🧱 5. Redshift

  1. Set up a Redshift cluster
  2. Create necessary tables using SQL schema provided
  3. Ensure IAM roles and bucket permissions are configured for COPY

📊 6. QuickSight

  1. Connect Redshift as a data source
  2. Build visualizations for metrics like:
    • Top cities by listings
    • Average property prices
    • Monthly trends in real estate listings

📁 Directory Structure

.
├── dags/
│ └── zillow_etl_dag.py
├── lambda/
│ ├── copy_raw_to_stage.py
│ └── transform_json_to_csv.py
├── scripts/
│ └── extract_zillow_data.py
├── redshift/
│ └── create_tables.sql
├── dashboard/
│ └── quicksight_screenshots/
├── requirements.txt

📸 Sample Dashboard (Amazon QuickSight)

Insert dashboard screenshots here to show data insights and visualizations.

🧠 Key Learnings

  • How to orchestrate an ETL workflow using Apache Airflow
  • Leveraging AWS Lambda for serverless data processing
  • Automating data pipelines using S3 event triggers and Airflow DAGs
  • Building insightful dashboards with Amazon QuickSight
  • Integrating multiple AWS services in a production-grade pipeline

🛡️ Prerequisites

  • AWS account with necessary permissions (IAM roles, S3, Lambda, Redshift, QuickSight)
  • RapidAPI account
  • Python 3.8+
  • Airflow 2.x

💬 Contact

📧 Email: nandinirathod909@gmail.com
🔗 LinkedIn: Nandini Rathod

About

End-to-end real estate data pipeline built using Python, Apache Airflow, AWS (S3, EC2), and RapidAPI to extract, transform, and load Zillow housing data into Amazon Redshift for scalable analytics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages