Skip to content

Use multiple consolidated databases #49

@boasvdp

Description

@boasvdp

Hi Jim, thanks for the amazing tool. We're starting to use it more and more within our team and really like it so far.

We use GTDB as the standard database and although the database is great, it's collapsed a couple of clinically relevant species into comprehensive species clusters. We end up having to supplement GTDB to still be able to differentiate several species of interest.

The new consolidated database functionality in v0.3 is great to reduce our inode footprint, but it means we now have to rebuild the whole GTDB set of representative genomes even if we only want to add a handful of new genomes.

Would it be possible to have some sort of middle ground approach, where the a user can specify multiple consolidated databases? This would really facilitate using skani in a more modular way, e.g. swapping out databases for different analyses or easily adding new genomes without rebuilding the whole set.

I have little understanding of the internal workings of skani so I have no idea whether this is possible or how much work this would require. Feel free to close the issue if this doesn't make sense!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions