How to compare

The methods are compared by their capability to represent the sequence data as well as their performance.

- [ ] distribution
- [ ]  minimal, average and maximal gap between elements
- [x]  speed
- [ ]  RAM usage
- [ ]  number of elements created (compression factor? number of elements divided by number of k-mers)
- [ ]  capability to find transcripts (True Positives, False Positives, True Negatives, False Negatives) 
- [ ] conservation (how many minimizers stay the same when sequence is slightly mutated, conservation should not be based solely on numbers of minimizers, but counting number of nucleotides covered in order to account for overlapping minimizers, see syncmer paper)
- [ ] distance on mutated sequence compated to not mutated sequence
- [ ] Handling of repitive and errorneous k-mers (simple cutoffs vs none vs weighting)

In the strobemer paper, they use two simulated data sets and define:
- the number of matches, where a match between sequence A and mutated sequence A' is an idential subsequence at position i
- the positions covered by the subsequences
- an island: a maximal interval of consequtive positions not covered as an island


- [ ] analysis how many submers are unqiue in a given genome and how similar these submers are to each other (edit distance)


Also, they check, how many unique subsequences there are in the five largest human chromosome to measure the precision of a method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to compare #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to compare #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions