-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
Description
@gwaygenomics presented evidence that median absolute deviation (MAD) feature selection (selecting genes with the highest MADs) can eliminate most features without hurting performance: #18 (comment). In fact, it appears that performance increased with the feature selection, which could make sense if the selection enriched for predictive features, increasing the signal-to-noise ratio.
Therefore, I think we should investigate this method of feature selection further. Specifically, I'm curious whether:
- @gwaygenomics' findings hold true for outcomes other than RAS?
- MAD is better than MAD / median? I think MAD could be biased against selecting genes that are lowly expressed but still variable?
- MAD outperforms random selection of the same feature set size?
- MAD performs well for other algorithms besides logistic regression?
I'm labeling this issue a task, so please investigate if you feel inclined.