-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hello, I gave sylph profile (v0.9.0) a try on samples generated from cultures that were expected to be pure, with the -u flag to estimate unknown sequence percentage. I used sylph against a database that contained the previously generated genomes of the bacteria of interest. For all samples, I got adjusted ANI scores of 100%, but for some of them I am getting sequence abundances as low as 75%. The genomes are quite complete (BUSCO completeness of 98%+). There is definitely a good number of unmapped reads in the samples (8%) but the number of unmapped reads according to bwa mem is a lot lower than the unknown reads estimated by sylph (8% vs 25%). A quick BLAST reveals that these unmapped reads most likely come from parts of the genome that weren't properly assembled; it is unlikely to be a contaminant (possible plasmids?). Is this a known issue? I understand that sylph wasn't designed for this application but I was still curious to hear the authors' thoughts on the issue and if there was any advice to improve this behavior. Thank you for your help.