Skip to content

Consider adding a low-complexity sequence masking to genomes before building #96

@jfy133

Description

@jfy133

Description of feature

This can in some cases improve classification as it doesn't try to find the best place for unspecific reads (assuming a tool accepts such masked genomes).

It may also make databases smaller as it will remove reundant unspecific regions (e.g. by reducing the number of kmers)

E.g. with dustmasker

Idea from: https://github.com/khyox/recentrifuge/wiki/Centrifuge-nt#step-by-step-instructions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement for existing functionality

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions