-The pipeline provides the option to scan unaligned reads for contamination from other species by using [Kraken2](https://ccb.jhu.edu/software/kraken2/) with or without corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). As Bracken is not a particularly expensive algorithm, we recommend using it to correct the abundance estimations from Kraken. An important note is that Kraken2 is [sensitive to the database](https://doi.org/10.1099/mgen.0.000949) that it is used with. It is [particularly important](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is part of the database, and if you are particularly concerned about specific contaminants, it may be worthwhile to use a smaller database that contains primarily those contaminants rather than the full standard database. Various pre-built databases can be found [here](https://benlangmead.github.io/aws-indexes/k2) and instructions for building a custom database can be found at the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Genomes of contaminants detected in previous sequencing experiments can be found on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php). Additionally, note that while one of the primary strengths of Kraken2 is that it can detect low abundance contaminants in a sample, false positives can also occur. If a very low number of reads of some contaminating species is detected, these results should be treated with caution.
0 commit comments