Update usage

egreenberg7 · egreenberg7 · commit c0099d4a3698 · 2024-08-15T17:55:14.000-04:00
diff --git a/docs/usage.md b/docs/usage.md
@@ -296,6 +296,10 @@ Notes:
 
 By default, the input GTF file will be filtered to ensure that sequence names correspond to those in the genome fasta file, and to remove rows with empty transcript identifiers. Filtering can be bypassed completely where you are confident it is not necessary, using the `--skip_gtf_filter` parameter. If you just want to skip the 'transcript_id' checking component of the GTF filtering script used in the pipeline this can be disabled specifically using the `--skip_gtf_transcript_filter` parameter.
 
+## Contamination screening options
+
+The pipeline provides the option to scan unaligned reads for contamination from other species by using [Kraken2](https://ccb.jhu.edu/software/kraken2/) with or without corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). As Bracken is not a particularly expensive algorithm, we recommend using it to correct the abundance estimations from Kraken. An important note is that Kraken2 is [sensitive to the database](https://doi.org/10.1099/mgen.0.000949) that it is used with. It is [particularly important](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is part of the database, and if you are particularly concerned about specific contaminants, it may be worthwhile to use a smaller database that contains primarily those contaminants rather than the full standard database. Various pre-built databases can be found [here](https://benlangmead.github.io/aws-indexes/k2) and instructions for building a custom database can be found at the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Genomes of contaminants detected in previous sequencing experiments can be found on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php). Additionally, note that while one of the primary strengths of Kraken2 is that it can detect loaw abundance contaminants in a sample, false positives can also occur. If a very low number of reads of some contaminating species is detected, these results should be treated with caution.
+
 ## Running the pipeline
 
 The typical command for running the pipeline is as follows: