This project predicts the sale price of houses based on features like living area, overall quality, garage details, etc., using the Kaggle House Prices Dataset.
train.csv: Training data (1460 rows, 81 columns)test.csv: Test data without SalePricesample_submission.csv: Sample output format for submission- Features include:
GrLivArea: Above-ground living area (sq ft)OverallQual: Overall material and finish qualityGarageCars: Number of cars in garageGarageArea: Size of garage in sq ft- And many more...
- β
Data cleaning (
dropna()) - β Exploratory Data Analysis (EDA)
- β Outlier Detection & Removal using scatterplots
- β
Feature Selection using
correlation analysis - β
Scaling with
StandardScaler - β Linear Regression model building
- β
Custom user input-based prediction using
.predict()
- RMSE:
42682.00 - RΒ² Score:
0.76
Initially, accuracy was ~60%, which improved to 76% after:
- Better feature selection
- Removing outliers
- Applying scaling
The model allows you to input:
- Living Area
- Overall Quality
- Garage Info
β‘οΈ and gives you a predicted house price!
π Estimated Price: βΉ 247337.64
# house-price-prediction
π Future Improvements
Try different regression models: Ridge, Lasso, Random Forest
Add cross-validation
Perform hyperparameter tuning
π¦ Requirements
Install using:
bash
Copy
Edit
pip install -r requirements.txt
π§ Learning Outcome
This was my first end-to-end ML regression project where I applied:
EDA
Feature Engineering
Visualization
Modeling
Evaluation
It boosted my confidence to apply what I learned into something real.
π Project Author
π¨βπ» Ravi Roy
B.Tech CSE, IILM University
Core IIC Club Member, Ex-Intern @ Oasis Infobyte, YBI Foundation, CodeAlpha
Passionate about AI/ML & solving real-world problems π±
π Kaggle Dataset
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques
π‘ House Prices - Advanced Regression Techniques