Skip to content

Nihaa-20/Real_Estate

Repository files navigation

🏒 Real Estate Investment Dashboard (India Housing Data)

This project is a Streamlit-based interactive dashboard built to analyze real estate property data across India.
It helps users explore price trends, location insights, amenities impact, property features, and investment-related factors through clean visualizations.

  1. Overview
    This project builds a complete data analytics dashboard to explore:
  • Property prices
  • Size distribution
  • Locality/City/State comparisons
  • Amenities, owner type, BHK, facing, furnishing
  • Multiple filters: State, City, Furnishing, Facing, Amenities, Property type
  • Interactive Plotly charts (bar, pie, heatmaps, comparisons)

The cleaned dataset and engineered features are created using eda_cleaning_pipeline.py, which includes:

  • Missing/duplicate handling
  • Numeric coercion
  • Feature engineering (Price per SqFt, Age, Density Scores, Amenities Count, Furnishing Flag, Facing Simplification, Ready-to-Move Flag)
  • Output: cleaned_india_housing.csv + preprocessor.joblib

  1. Data Cleaning Pipeline (eda_cleaning_pipeline.py)

This script:

  • Loads india_housing_prices.csv
  • Renames and standardizes columns
  • Removes duplicates
  • Converts numeric text β†’ numbers
  • Handles missing values (median imputation for numeric, "Unknown" for categorical)
  • Drops numeric columns that are fully empty
  • Recomputes missing Price_per_SqFt
  • Creates several engineered features:
    • Price_per_SqFt_final
    • Age_of_Property
    • School_Density_Score
    • Hospital_Density_Score
    • Amenities_count
    • Is_Fully_Furnished
    • Is_Ready_to_Move
    • Facing_simple
  • Saves:
    • cleaned_india_housing.csv
    • preprocessor.joblib

  1. Dashboard (app.py)

The Streamlit dashboard provides:

Key Features

  • Full-width dashboard layout
  • 5 KPIs
  • 4 analysis pages:
    1. Price & Size Analysis
    2. Location Analysis
    3. Relationship & Correlation
    4. Ownership / Investment / Amenities

Tech Highlights

  • Multi-dropdown filters with β€œSelect All” logic
  • Plotly visualizations
  • Clean UI with HTML/CSS adjustments
  • Custom re-usable function to render charts in two-column layout

Run the dashboard using:

streamlit run app.py


πŸ“ Project Structure

πŸ“¦ Real_Estate β”‚ β”œβ”€β”€ app.py # Streamlit dashboard β”œβ”€β”€ eda_cleaning_pipeline.py # Data cleaning & preprocessing script β”œβ”€β”€ train_models.py # (Optional) ML training script β”‚ β”œβ”€β”€ india_housing_prices.csv # Raw dataset (not included) β”œβ”€β”€ cleaned_india_housing.csv # Cleaned dataset (generated) β”œβ”€β”€ preprocessor.joblib # Saved preprocessing pipeline β”œβ”€β”€ investment_clf.joblib # (Optional) Trained classifier β”œβ”€β”€ price_reg.joblib # (Optional) Trained regressor β”œβ”€β”€ feature_names.joblib # OHE-expanded feature names β”‚ β”œβ”€β”€ README.md └── requirements.txt

πŸ› οΈ Installation & Setup

1️⃣ Clone the repository git clone https://github.com/your-username/real-estate-dashboard.git cd real-estate-dashboard

2️⃣ Install dependencies pip install -r requirements.txt

3️⃣ Run data cleaning pipeline python eda_cleaning_pipeline.py

This generates: cleaned_india_housing.csv preprocessor.joblib

4️⃣ Launch Streamlit dashboard streamlit run app.py

πŸ“Š Data Used The dataset contains cleaned housing information including:

Price (Lakhs) Price per SqFt Size in SqFt Amenities count Age of property Furnished status BHK Facing direction Ownership & availability details Location: State, City, Locality Nearby hospitals & schools Public transport accessibility

🧹 Cleaning & Feature Engineering All cleaning happens in eda_cleaning_pipeline.py, which:

Removes duplicates Converts numeric columns Handles missing values Creates new columns: Price_per_SqFt_final School_Density_Score Hospital_Density_Score Amenities_count Is_Fully_Furnished Facing_simple Is_Ready_to_Move Saves cleaned dataset

🎯 Purpose of This Project This dashboard was created to:

Visualize property trends Understand location-based pricing Compare amenities impact on price Explore investment-related metrics No machine learning models are included.

πŸ§‘β€πŸ’» Tech Stack Python 3.10+ Streamlit Pandas Plotly Express NumPy Scikit-learn Seaborn & Matplotlib (used inside cleaning script)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages