Machine learning project with PySpark for UVa Data Science Big Data Class
We are using the Lending Club dataset on Kaggle: https://www.kaggle.com/wordsforthewise/lending-club
Our goal is to use PySpark to clean the data and use the ML library to do feature selection, hyperparameter tuning, and compare various machine learning algorithms to try to predict whether a borrower will default on their loan or not.
Contributors:
Max McGaw https://github.com/mmcgaw182
Will Carruthers https://github.com/wcarruthers
Liam Mulcahy https://github.com/liamtmul
Matt Thomas