A complete beginner-friendly Machine Learning project built using the Ames Housing dataset.
This project predicts the sale price of houses using Linear Regression, along with data cleaning,
outlier handling, and One-Hot Encoding for categorical variables.
This project walks through the full process of building a regression model step-by-step:
- Data Loading & Inspection
- Data Cleaning
- Handling missing values
- Removing outliers (
GrLivArea > 4000 & SalePrice < 300000)
- Exploratory Data Analysis (EDA)
- Correlation heatmaps
- Key relationships (
GrLivArea,OverallQual, etc.)
- Model Building
- Linear Regression (Baseline + Improved)
- One-Hot Encoding for categorical variables
- Model Evaluation
- R² Score
- RMSE
- Visualization of Actual vs Predicted values
- Residuals distribution
| Metric | Value |
|---|---|
| R² Score | ~0.89 |
| RMSE | ~23,000 |
The model performs quite well for a simple Linear Regression model!
- Importance of outlier removal before fitting a linear model
- How categorical encoding (One-Hot) improves regression performance
- Basic workflow of a data science project, from EDA → modeling → evaluation
- Python 🐍
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
House_Price_Prediction_Ishu_Final.ipynb # Final Jupyter notebook README.md # Project overview
⭐ If you like this project, give it a star on GitHub! ⭐
- Try advanced models like Ridge, Lasso, or RandomForest
- Hyperparameter tuning for better performance
- Deploy model using Streamlit
Ishu Singh
📧 Email: [email protected]
🌐 GitHub: https://github.com/ishuz-data-Git
⭐ If you like this project, give it a star on GitHub! ⭐