These lessons employ over a dozen models to build ensemble models for the UCI heart disease dataset.
Special focus is on developing a framework for finding the right train-test split and correcting bias when training over a dozen models.
These topics have led to artices on Medium:
https://medium.com/towards-data-science/exploring-best-test-size-number-of-folds-and-repeated-hold-out-bbf773f370b6
https://medium.com/mlearning-ai/a-new-safe-and-gaussian-softmax-function-6e0419f28679
Using an ensemble model, we are able to obtain 86% accuracy on heart disease prediction given hospital data (from 30 years ago).
We explore heart disease from an economic perspective where we employ ensemble methods to get a weighted accuracy of 78% of predicting heart disease from economic and self-identifying poll data alone.
Plotting, feature selection and cross correlation custom functions, especially for feature selection and model comparison.