generated from seqan/app-template
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The methods are compared by their capability to represent the sequence data as well as their performance.
- distribution
- minimal, average and maximal gap between elements
- speed
- RAM usage
- number of elements created (compression factor? number of elements divided by number of k-mers)
- capability to find transcripts (True Positives, False Positives, True Negatives, False Negatives)
- conservation (how many minimizers stay the same when sequence is slightly mutated, conservation should not be based solely on numbers of minimizers, but counting number of nucleotides covered in order to account for overlapping minimizers, see syncmer paper)
- distance on mutated sequence compated to not mutated sequence
- Handling of repitive and errorneous k-mers (simple cutoffs vs none vs weighting)
In the strobemer paper, they use two simulated data sets and define:
-
the number of matches, where a match between sequence A and mutated sequence A' is an idential subsequence at position i
-
the positions covered by the subsequences
-
an island: a maximal interval of consequtive positions not covered as an island
-
analysis how many submers are unqiue in a given genome and how similar these submers are to each other (edit distance)
Also, they check, how many unique subsequences there are in the five largest human chromosome to measure the precision of a method.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request