Skip to content

Median absolute deviation feature selection #22

@dhimmel

Description

@dhimmel

@gwaygenomics presented evidence that median absolute deviation (MAD) feature selection (selecting genes with the highest MADs) can eliminate most features without hurting performance: #18 (comment). In fact, it appears that performance increased with the feature selection, which could make sense if the selection enriched for predictive features, increasing the signal-to-noise ratio.

Therefore, I think we should investigate this method of feature selection further. Specifically, I'm curious whether:

  • @gwaygenomics' findings hold true for outcomes other than RAS?
  • MAD is better than MAD / median? I think MAD could be biased against selecting genes that are lowly expressed but still variable?
  • MAD outperforms random selection of the same feature set size?
  • MAD performs well for other algorithms besides logistic regression?

I'm labeling this issue a task, so please investigate if you feel inclined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions