Skip to content

Compression parameter and coverage inconsistencies #74

@MicroBTM

Description

@MicroBTM

Hi,

I was wondering if you think it would be worth adding an option that makes databases which scale a compression parameter to each input genome size? Using -c 100 for virus genomes seems inconsistent, as viruses have a massive range in genome size (spanning around 4 orders of magnitude, with some larger than bacterial genomes). If users picked a minimum and maximum compression value for the scaling approach, they could use the minimum value as a compression parameter to profile the resulting database.

Currently, it looks like small viruses will have much lower sensitivity than large ones with what I've gathered from sylph inspect and metadata on my database genomes:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions