Purpose: To use a supervised learning algorithm to predict income bracket for individuals based on their census data. This is for a fictional charity who wants to identify potential donors.
Method: After a brief shortlisting exercise, scikit-learn was used to evaluate Adaboost, Random Forest and Support Vector Machine algorithms. Adaboost was then taken forward for tuning and testing.
All steps completed for this project may be found in the Jupyter Notebook. An html version of the notebook is also in this repository. The steps include:
- Data exploration.
- Data preparation.
- Algorithm evaluation and selection.
- Model tuning.
- Feature selection.