Merge pull request #790 from jenmuell/master

drpatelh · web-flow · commit f5d57078a496 · 2022-05-03T10:08:39.000+01:00
Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765
diff --git a/docs/usage.md b/docs/usage.md
@@ -77,6 +77,20 @@ If you are using [GENCODE](https://www.gencodegenes.org/) reference genome files
 - The `--gtf_group_features_type` parameter will automatically be set to `gene_type` as opposed to `gene_biotype`, respectively.
 - If you are running Salmon, the `--gencode` flag will also be passed to the index building step to overcome parsing issues resulting from the transcript IDs in GENCODE fasta files being separated by vertical pipes (`|`) instead of spaces (see [this issue](https://github.com/COMBINE-lab/salmon/issues/15)).
 
+## Adapt pipeline parameters for prokaryotes
+
+The default settings of the pipeline are mainly adapted for eukaryotes but have to be changed slightly for prokaryotes. The main reason for this is the different genetic architecure of prokaryotes. The below mentioned parameters work if a `gff` file is provided as reference.
+
+Changes and parameter specifications for prokaryotes:
+
+* Use `--featurecounts_feature_type transcript` since the default value `exon` does not contain the required `--featurecounts_group_type gene_biotype` specification.
+* You can use `--featurecounts_feature_type CDS` in combination with `--featurecoutns_group_type product` but than featureCounts will no longer reflect the biotypes of your RNA. It could be helpful to identify the number of hypothetical proteins.
+* If your execution struggle with Salmon as aligner, change `--alginer` to hisat2.
+* `--skip_rseqc` skip RSeQC since features like splice junctions, transcription start (TSS) and ending sites (TES) are less informative in prokaryotes than in eukaryotes.
+* `--skip_biotype_qc` in case biotypes of your RNA data are of no interest.
+
+> **NB:** For older versions of the pipeline the names may be different. Check the paramters docs for details.
+
 ## Running the pipeline
 
 The typical command for running the pipeline is as follows: