Skip to content

Commit 75a10f7

Browse files
Update docs/usage.md
Co-authored-by: Matthias Zepper <[email protected]>
1 parent 2a322c4 commit 75a10f7

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

docs/usage.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,11 @@ By default, the input GTF file will be filtered to ensure that sequence names co
298298

299299
## Contamination screening options
300300

301-
The pipeline provides the option to scan unaligned reads for contamination from other species by using [Kraken2](https://ccb.jhu.edu/software/kraken2/) with or without corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). As Bracken is not a particularly expensive algorithm, we recommend using it to correct the abundance estimations from Kraken. An important note is that Kraken2 is [sensitive to the database](https://doi.org/10.1099/mgen.0.000949) that it is used with. It is [particularly important](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is part of the database, and if you are particularly concerned about specific contaminants, it may be worthwhile to use a smaller database that contains primarily those contaminants rather than the full standard database. Various pre-built databases can be found [here](https://benlangmead.github.io/aws-indexes/k2) and instructions for building a custom database can be found at the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Genomes of contaminants detected in previous sequencing experiments can be found on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php). Additionally, note that while one of the primary strengths of Kraken2 is that it can detect low abundance contaminants in a sample, false positives can also occur. If a very low number of reads of some contaminating species is detected, these results should be treated with caution.
301+
The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
302+
303+
It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
304+
305+
While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution.
302306

303307
## Running the pipeline
304308

0 commit comments

Comments
 (0)