Skip to content

How to compare #2

@MitraDarja

Description

@MitraDarja

The methods are compared by their capability to represent the sequence data as well as their performance.

  • distribution
  • minimal, average and maximal gap between elements
  • speed
  • RAM usage
  • number of elements created (compression factor? number of elements divided by number of k-mers)
  • capability to find transcripts (True Positives, False Positives, True Negatives, False Negatives)
  • conservation (how many minimizers stay the same when sequence is slightly mutated, conservation should not be based solely on numbers of minimizers, but counting number of nucleotides covered in order to account for overlapping minimizers, see syncmer paper)
  • distance on mutated sequence compated to not mutated sequence
  • Handling of repitive and errorneous k-mers (simple cutoffs vs none vs weighting)

In the strobemer paper, they use two simulated data sets and define:

  • the number of matches, where a match between sequence A and mutated sequence A' is an idential subsequence at position i

  • the positions covered by the subsequences

  • an island: a maximal interval of consequtive positions not covered as an island

  • analysis how many submers are unqiue in a given genome and how similar these submers are to each other (edit distance)

Also, they check, how many unique subsequences there are in the five largest human chromosome to measure the precision of a method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions