Sylph overestimates percentage of unknown reads?

Hello, I gave sylph profile (v0.9.0) a try on samples generated from cultures that were expected to be pure, with the -u flag to estimate unknown sequence percentage. I used sylph against a database that contained the previously generated genomes of the bacteria of interest. For all samples, I got adjusted ANI scores of 100%, but for some of them I am getting sequence abundances as low as 75%. The genomes are quite complete (BUSCO completeness of 98%+). There is definitely a good number of unmapped reads in the samples (8%) but the number of unmapped reads according to bwa mem is a lot lower than the unknown reads estimated by sylph (8% vs 25%). A quick BLAST reveals that these unmapped reads most likely come from parts of the genome that weren't properly assembled; it is unlikely to be a contaminant (possible plasmids?). Is this a known issue? I understand that sylph wasn't designed for this application but I was still curious to hear the authors' thoughts on the issue and if there was any advice to improve this behavior. Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sylph overestimates percentage of unknown reads? #73

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sylph overestimates percentage of unknown reads? #73

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions