π’ Real Estate Investment Dashboard (India Housing Data)
This project is a Streamlit-based interactive dashboard built to analyze real estate property data across India.
It helps users explore price trends, location insights, amenities impact, property features, and investment-related factors through clean visualizations.
- Overview
This project builds a complete data analytics dashboard to explore:
- Property prices
- Size distribution
- Locality/City/State comparisons
- Amenities, owner type, BHK, facing, furnishing
- Multiple filters: State, City, Furnishing, Facing, Amenities, Property type
- Interactive Plotly charts (bar, pie, heatmaps, comparisons)
The cleaned dataset and engineered features are created using eda_cleaning_pipeline.py, which includes:
- Missing/duplicate handling
- Numeric coercion
- Feature engineering (Price per SqFt, Age, Density Scores, Amenities Count, Furnishing Flag, Facing Simplification, Ready-to-Move Flag)
- Output:
cleaned_india_housing.csv+preprocessor.joblib
- Data Cleaning Pipeline (
eda_cleaning_pipeline.py)
This script:
- Loads
india_housing_prices.csv - Renames and standardizes columns
- Removes duplicates
- Converts numeric text β numbers
- Handles missing values (median imputation for numeric, "Unknown" for categorical)
- Drops numeric columns that are fully empty
- Recomputes missing Price_per_SqFt
- Creates several engineered features:
Price_per_SqFt_finalAge_of_PropertySchool_Density_ScoreHospital_Density_ScoreAmenities_countIs_Fully_FurnishedIs_Ready_to_MoveFacing_simple
- Saves:
cleaned_india_housing.csvpreprocessor.joblib
- Dashboard (
app.py)
The Streamlit dashboard provides:
Key Features
- Full-width dashboard layout
- 5 KPIs
- 4 analysis pages:
- Price & Size Analysis
- Location Analysis
- Relationship & Correlation
- Ownership / Investment / Amenities
Tech Highlights
- Multi-dropdown filters with βSelect Allβ logic
- Plotly visualizations
- Clean UI with HTML/CSS adjustments
- Custom re-usable function to render charts in two-column layout
Run the dashboard using:
streamlit run app.py
π Project Structure
π¦ Real_Estate β βββ app.py # Streamlit dashboard βββ eda_cleaning_pipeline.py # Data cleaning & preprocessing script βββ train_models.py # (Optional) ML training script β βββ india_housing_prices.csv # Raw dataset (not included) βββ cleaned_india_housing.csv # Cleaned dataset (generated) βββ preprocessor.joblib # Saved preprocessing pipeline βββ investment_clf.joblib # (Optional) Trained classifier βββ price_reg.joblib # (Optional) Trained regressor βββ feature_names.joblib # OHE-expanded feature names β βββ README.md βββ requirements.txt
π οΈ Installation & Setup
1οΈβ£ Clone the repository git clone https://github.com/your-username/real-estate-dashboard.git cd real-estate-dashboard
2οΈβ£ Install dependencies pip install -r requirements.txt
3οΈβ£ Run data cleaning pipeline python eda_cleaning_pipeline.py
This generates: cleaned_india_housing.csv preprocessor.joblib
4οΈβ£ Launch Streamlit dashboard streamlit run app.py
π Data Used The dataset contains cleaned housing information including:
Price (Lakhs) Price per SqFt Size in SqFt Amenities count Age of property Furnished status BHK Facing direction Ownership & availability details Location: State, City, Locality Nearby hospitals & schools Public transport accessibility
π§Ή Cleaning & Feature Engineering All cleaning happens in eda_cleaning_pipeline.py, which:
Removes duplicates Converts numeric columns Handles missing values Creates new columns: Price_per_SqFt_final School_Density_Score Hospital_Density_Score Amenities_count Is_Fully_Furnished Facing_simple Is_Ready_to_Move Saves cleaned dataset
π― Purpose of This Project This dashboard was created to:
Visualize property trends Understand location-based pricing Compare amenities impact on price Explore investment-related metrics No machine learning models are included.
π§βπ» Tech Stack Python 3.10+ Streamlit Pandas Plotly Express NumPy Scikit-learn Seaborn & Matplotlib (used inside cleaning script)