This repository contains an end-to-end regression project using the
Ames Housing dataset from Kaggle.
The goal is to predict house sale prices by applying:
- classical data preprocessing techniques
- feature engineering
- and a Neural Network built using TensorFlow Keras Functional API
This project is primarily for learning and experimentation, not leaderboard chasing.
- Source: Kaggle Competition
House Prices: Advanced Regression Techniques - Target variable:
SalePrice - Data includes:
- numerical features
- categorical features
- missing values
- skewed distributions
.
├── data/ # Local dataset (ignored by git)
├── notebooks/ # EDA & experiments
├── src/ # Preprocessing & model code
├── models/ # Saved models (optional)
├── README.md
├── .gitignore
└── requirements.txt
- Download dataset using Kaggle CLI
- Store raw data locally under
data/
- Understand feature distributions
- Analyze target variable (
SalePrice) - Identify:
- missing values
- skewness
- outliers
- Visualizations:
- histograms
- correlation heatmaps
- pairplots (feature vs target)
- Handle missing values
- Encode categorical variables
- Scale numerical features
- Log-transform skewed variables
- Train–test split
- Define input layers explicitly
- Build dense neural network with:
- multiple hidden layers
- ReLU activations
- output layer for regression
- Compile model with:
- optimizer (Adam)
- loss function (MSE / MAE)
- Train on training data
- Validate on hold-out set
- Monitor:
- training loss
- validation loss
- Tune:
- number of layers
- neurons
- learning rate
- Evaluate model using:
- RMSE
- MAE
- Compare with baseline ML models (optional)
- Feature selection
- Regularization (Dropout / L2)
- Hyperparameter tuning
- Architecture experiments
- More flexible than
Sequential - Explicit control over:
- inputs
- outputs
- complex architectures
- Industry-standard for non-trivial models
- Python
- NumPy, Pandas
- Matplotlib, Seaborn
- Scikit-learn
- TensorFlow / Keras
- Kaggle API
- Git & GitHub
data/directory is ignored using.gitignore- Kaggle datasets should not be committed
- This repo focuses on learning and clarity, not competition ranking
This project is licensed under the Apache License 2.0
See the LICENSE file for details.
Ashiq KM
Learning-focused ML & Deep Learning experiments 🚀
