This repo contains my solution to Kaggle's House Prices: Advanced Regression Techniques competition.
I managed to get a top 5% score on August 16, 2017 with a score of .11459.
The code was written (and has only been tested) on a Mac using Anaconda Python 3.6. See requirements.txt for the modules used and their versions.
Explorative_Data_Analysis.ipynb: Jupyter notebook which shows how I analyzed the data, including observations and conclusions.Model.ipynb: Jupyter notebook with machine learning code.crossval.py: Cross-validation helper functions.preprocess.py: Data pre-processing functions, K-Nearest Neighbour imputation.utils.py: Various functions for scoring metrics, numeric transformations, plots etc.
Get the data from Kaggle
and place it in a directory called data. Install the pre-requisite packages and fire up
jupyter notebook using a Python 3.6 kernel.