Skip to content

Commit f5d5707

Browse files
authored
Merge pull request #790 from jenmuell/master
Added paragraph about the usage of rnaseq with prokaryotic data based on Issue #765
2 parents d7946a8 + 7b047ce commit f5d5707

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

docs/usage.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,20 @@ If you are using [GENCODE](https://www.gencodegenes.org/) reference genome files
7777
- The `--gtf_group_features_type` parameter will automatically be set to `gene_type` as opposed to `gene_biotype`, respectively.
7878
- If you are running Salmon, the `--gencode` flag will also be passed to the index building step to overcome parsing issues resulting from the transcript IDs in GENCODE fasta files being separated by vertical pipes (`|`) instead of spaces (see [this issue](https://github.com/COMBINE-lab/salmon/issues/15)).
7979

80+
## Adapt pipeline parameters for prokaryotes
81+
82+
The default settings of the pipeline are mainly adapted for eukaryotes but have to be changed slightly for prokaryotes. The main reason for this is the different genetic architecure of prokaryotes. The below mentioned parameters work if a `gff` file is provided as reference.
83+
84+
Changes and parameter specifications for prokaryotes:
85+
86+
* Use `--featurecounts_feature_type transcript` since the default value `exon` does not contain the required `--featurecounts_group_type gene_biotype` specification.
87+
* You can use `--featurecounts_feature_type CDS` in combination with `--featurecoutns_group_type product` but than featureCounts will no longer reflect the biotypes of your RNA. It could be helpful to identify the number of hypothetical proteins.
88+
* If your execution struggle with Salmon as aligner, change `--alginer` to hisat2.
89+
* `--skip_rseqc` skip RSeQC since features like splice junctions, transcription start (TSS) and ending sites (TES) are less informative in prokaryotes than in eukaryotes.
90+
* `--skip_biotype_qc` in case biotypes of your RNA data are of no interest.
91+
92+
> **NB:** For older versions of the pipeline the names may be different. Check the paramters docs for details.
93+
8094
## Running the pipeline
8195

8296
The typical command for running the pipeline is as follows:

0 commit comments

Comments
 (0)