Skip to content

Multiple comparisons problems #83

@patrick-miller

Description

@patrick-miller

I'm still working my way through the paper published by @gwaygenomics, @allaway and @cgreene, but it made me think of an issue that I believe we should try to deal with in our final product. In the paper they had a specific hypothesis that they tested; however, we are going to provide people with the ability to test out hypotheses on thousands of different mutations.

There are some problems with this ability, such as non-response bias. There are bound to be many uninteresting results (AUROC = 0.5) for different genes that people will tend to glance over. I can very easily imagine a scenario where someone iterates through many different genes until they reach one where a model does a good job at predicting a mutation.

We could approach this issue in a few different ways:

  1. hold out some data for validation -- only to be used for publication
  2. apply some sort of correction (e.g. Bonferroni)
  3. place strong emphasis on effect sizes
  4. list a clear disclaimer

I wanted to open this issue up so we can discuss the importance of the problem and possible solutions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions