Update docs/usage.md

egreenberg7 · MatthiasZepper · web-flow · commit 75a10f7f119d · 2024-08-19T12:01:54.000-04:00
Co-authored-by: Matthias Zepper &lt;6963520+MatthiasZepper@users.noreply.github.com&gt;
diff --git a/docs/usage.md b/docs/usage.md
@@ -298,7 +298,11 @@ By default, the input GTF file will be filtered to ensure that sequence names co
 
 ## Contamination screening options
 
-The pipeline provides the option to scan unaligned reads for contamination from other species by using [Kraken2](https://ccb.jhu.edu/software/kraken2/) with or without corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). As Bracken is not a particularly expensive algorithm, we recommend using it to correct the abundance estimations from Kraken. An important note is that Kraken2 is [sensitive to the database](https://doi.org/10.1099/mgen.0.000949) that it is used with. It is [particularly important](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is part of the database, and if you are particularly concerned about specific contaminants, it may be worthwhile to use a smaller database that contains primarily those contaminants rather than the full standard database. Various pre-built databases can be found [here](https://benlangmead.github.io/aws-indexes/k2) and instructions for building a custom database can be found at the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Genomes of contaminants detected in previous sequencing experiments can be found on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php). Additionally, note that while one of the primary strengths of Kraken2 is that it can detect low abundance contaminants in a sample, false positives can also occur. If a very low number of reads of some contaminating species is detected, these results should be treated with caution.
+The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
+
+It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
+
+While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution.
 
 ## Running the pipeline