-
Notifications
You must be signed in to change notification settings - Fork 49
Description
What are your thoughts on profiling contigs without Burrows-Wheeler style mapping? And is it possible VAMB could work with such alternative sources of counts? I ask because mapping is the main bottleneck currently.
If reads were simply split into long enough kmers, like the sizes typically used for assembly, they should yield enough specificity for unique mapping. Right? Or no? Although sequencing errors would interfere with string searches, they are rare, especially if reads are qc'd.
Default bwa mem is super unspecific for metagenomic profiling, and should be filtered with something like msamtools, which removes half of the read mappings, before generating e.g. an OTU table.
VAMB is magically able to see past bwa's noise and still define biologically meaningful clusters. Should it not be able to do the same with kmer counts?