🏠 Advanced House Prices Prediction

A complete machine learning solution for predicting house prices using the Ames Housing dataset. This project implements an end-to-end pipeline from data preprocessing to web deployment, achieving competitive performance on the Kaggle leaderboard.

🚀 Features

Data Processing Pipeline: Automated cleaning and feature engineering for optimal model performance. Please read data/data_description.txt for detailed feature explanations and comprehensive understanding of the dataset.
Multiple ML Models: Linear Regression and XGBoost implementations with performance comparison
Interactive Web Application: Streamlit-based interface for real-time predictions and custom model training
Production Ready: Docker containerization and comprehensive evaluation metrics

📂 Folder Structure

├── 📂 data/                # Datasets and description
│   ├── 📄 train.csv        # Training data
│   ├── 📄 test.csv         # Test data
│   ├── 📄 new_train.csv    # Processed training data
│   ├── 📄 new_test.csv     # Processed test data
│   ├── 📄 data_description.txt
├── 📂 models/              # Trained models and schemas
│   ├── 📦 linear_regression_model.pkl
│   └── 📄 schemas.py       # Pydantic data validation
│
├── 📂 configs/             # Application configuration
│   └── ⚙️ config.py
│
├── 📂 views/               # Streamlit application components
│   ├── 🏡 house_price.py   # Main prediction interface
│   ├── 🛠️ custom_linear_app.py    # Custom model training
│   └── 🚀 custom_xgboost.py       # XGBoost implementation
│
├── 📓 data_cleaning_and_feature_engineering.ipynb
├── 🐳 Dockerfile           # Container configuration
├── environment.yml         # Conda environment
├── Instructions.md         # Student instructions
├── LICENSE                 # License information
├── 🚀 main.py              # Application entry point
├── Party_Time.ipynb        # Google Colab notebook
├── 📦 requirements.txt     # Dependencies
├── server-instructions.md  # Server setup guide
└── 🖥️ start.sh             # Launch script

🚀 Quick Start

Prerequisites

Python 3.8+
pip or conda
Kaggle account (for dataset access)

Installation

Clone the repository

git clone <repository-url>
cd Kaggle_Advanced_House_Prices

⚙️ Environment Setup

🅰️ Option 1: Create and activate a virtual environment

🐧 Linux/macOS

python3.11 -m venv .venv
source .venv/bin/activate

🪟 Windows

python -m venv .venv
.\.venv\Scripts\activate

Or alternatively using Conda for all OS:

conda env create -f environment.yml
conda activate kaggle-house-prices

Add conda to your kernel to use it in Jupyter Notebook.

conda install ipykernel
python -m ipykernel install --user --name loan-approval --display-name "Loan Approval"

In VSCode Press Ctrl+Shift+P and select "Python: Select Interpreter", then choose the "Loan Approval" interpreter.
Once you open the Jupyter Notebook, it should automatically use the "Loan Approval" kernel. If not, please restart VSCode. And if not successful, then on the top right corner of the notebook, you can manually select the kernel by clicking on it and choosing "Loan Approval". You most likely will find it in the Jupyter kernel list.

2️⃣ Install dependencies

pip install -r requirements.txt

🅱️ Option 2: Run with Docker

Build and start the app using Docker:

docker build -t streamlit_app .
docker run --rm -p 8501:8501 streamlit_app

Or use the provided shell script:

bash start.sh

🛠️ Step-by-Step Guide

🏆 Kaggle Setup

To use Kaggle datasets or APIs, you need to set up your Kaggle credentials:

🔗 Go to your Kaggle account settings: https://www.kaggle.com/account.
🛡️ Scroll down to the API section and click Create New API Token.
📥 This will download a file named kaggle.json.
📂 Place kaggle.json in the folder:
- 🐧 Linux/macOS: ~/.kaggle/ or sometimes in ~/.config/kaggle/kaggle.json
- 🪟 Windows: C:\Users\<YourUsername>\.kaggle\
🔒 Make sure the file permissions are secure (Linux/macOS):
```
chmod 600 ~/.kaggle/kaggle.json
```
In order to be able to download the dataset you must join the competition and accept the rules.

📦 You can now use the Kaggle CLI to download datasets:

kaggle competitions download -c house-prices-advanced-regression-techniques -p data/

1️⃣ Data Cleaning & Feature Engineering

Open and run data_cleaning_and_feature_engineering.ipynb to preprocess and engineer features from the raw data.
Follow up to train a Linear Regression Model on this difficult case.

2️⃣ Train Model (if needed)

Use scripts or notebook to train and save models in models/ (default model provided). Note: By running the data_cleaning_and_feature_engineering.ipynb notebook, it will train the model and save it automatically.

3️⃣ Launch the Streamlit App

Run the following command or using docker:
```
streamlit run main.py
```
Use the sidebar to select between house price prediction and custom regression modules.

Train your own model on your own data using the custom regression module.

Virtual Environment:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Conda Environment:

conda env create -f environment.yml
conda activate kaggle-house-prices

Download dataset (requires Kaggle account)

kaggle competitions download -c house-prices-advanced-regression-techniques -p data/

Run the application
```
streamlit run main.py
```

Docker Deployment

# Build and run with Docker
docker build -t house-prices-app .
docker run --rm -p 8501:8501 house-prices-app

# Or use the provided script
bash start.sh

💻 Usage

Web Application

House Price Prediction: Input property features to get price predictions
Custom Model Training: Upload your own dataset and train models interactively
Model Comparison: Compare performance between different algorithms

Jupyter Notebook

Run data_cleaning_and_feature_engineering.ipynb to:

Explore the dataset through comprehensive EDA
Apply feature engineering techniques
Train and evaluate machine learning models
Generate submission files for Kaggle

� Configuration

Key configuration options in configs/config.py:

Model file paths
Feature definitions and default values
Application settings and parameters

📝 API Reference

Model Schema (Pydantic)

The application uses Pydantic schemas for data validation:

# Input features validation
class HouseFeatures(BaseModel):
    overall_qual: int
    gr_liv_area: float
    garage_cars: float
    # ... additional features

# Prediction output
class PricePrediction(BaseModel):
    predicted_price: float
    confidence_interval: Optional[Tuple[float, float]]

� Model Performance Results

Model	MAE	MSE	RMSE
Linear Regression	19,452.09	711,102,117.35	26,666.50
XGBoost	16,483.55	578,007,680.00	24,041.79

XGBoost demonstrates superior performance across all metrics, providing more accurate and robust predictions for house price estimation.

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

� License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Kaggle for providing the dataset and competition platform
The Ames Housing dataset compiled by Dean De Cock
Streamlit for the excellent web framework
The open-source community for the amazing machine learning libraries

📚 Documentation

For detailed setup and learning instructions, see:

Student Instructions - Complete learning guide
Environment Setup - Development environment
Kaggle Setup - API and data access
GitHub Setup - Version control setup

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
Docs		Docs
colab		colab
configs		configs
data		data
extra_data		extra_data
models		models
views		views
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Ardavan_data_cleaning_and_feature_engineering.ipynb		Ardavan_data_cleaning_and_feature_engineering.ipynb
CF_data_cleaning_and_feature_engineering.ipynb		CF_data_cleaning_and_feature_engineering.ipynb
Dockerfile		Dockerfile
Instructions.md		Instructions.md
LICENSE		LICENSE
Party_Time.ipynb		Party_Time.ipynb
README.md		README.md
TODO		TODO
data_cleaning_and_feature_engineering.ipynb		data_cleaning_and_feature_engineering.ipynb
environment.yml		environment.yml
issues.md		issues.md
main.py		main.py
requirements.txt		requirements.txt
server-instructions.md		server-instructions.md
yi_README.md		yi_README.md
yi_data_cleaning_and_feature_engineering.ipynb		yi_data_cleaning_and_feature_engineering.ipynb

Folders and files

Latest commit

History

Repository files navigation

🏠 Advanced House Prices Prediction

🚀 Features

📂 Folder Structure

🚀 Quick Start

Prerequisites

Installation

⚙️ Environment Setup

🅰️ Option 1: Create and activate a virtual environment

🐧 Linux/macOS

🪟 Windows

Add conda to your kernel to use it in Jupyter Notebook.

2️⃣ Install dependencies

🅱️ Option 2: Run with Docker

🛠️ Step-by-Step Guide

🏆 Kaggle Setup

1️⃣ Data Cleaning & Feature Engineering

2️⃣ Train Model (if needed)

3️⃣ Launch the Streamlit App

Docker Deployment

💻 Usage

Web Application

Jupyter Notebook

� Configuration

📝 API Reference

Model Schema (Pydantic)

� Model Performance Results

🤝 Contributing

� License

🙏 Acknowledgments

📚 Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages