Skip to content

Commit 4cff9e9

Browse files
author
jenmuell
committed
added adaptations for prokaryotic data below the paragraph reference genomes
1 parent 7106bd7 commit 4cff9e9

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

docs/usage.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,20 @@ If you are using [GENCODE](https://www.gencodegenes.org/) reference genome files
7575
* The `--gtf_group_features_type` parameter will automatically be set to `gene_type` as opposed to `gene_biotype`, respectively.
7676
* If you are running Salmon, the `--gencode` flag will also be passed to the index building step to overcome parsing issues resulting from the transcript IDs in GENCODE fasta files being separated by vertical pipes (`|`) instead of spaces (see [this issue](https://github.com/COMBINE-lab/salmon/issues/15)).
7777

78+
## Adapt pipeline parameters for prokaryotes
79+
80+
The default settings of the pipeline are mainly adapted for eukaryotes but have to be changed slightly for prokaryotes. The main reason for this is the different genetic architecure of prokaryotes. The below mentioned parameters work if a `gff` file is provided as reference.
81+
82+
Changes and parameter specifications for prokaryotes:
83+
* `--featurecounts_feature_type transcript` since the default value `exon` does not contain the required `--featurecounts_group_type gene_biotype` specification.
84+
* You can use `--featurecounts_feature_type CDS` in combination with `--featurecoutns_group_type product` but than featureCounts will no longer reflect the biotypes of your RNA. It could be helpful to identify the number of hypothetical proteins.
85+
* If your execution struggle with Salmon as aligner, change `--alginer` to hisat2.
86+
* You can skip RSeQC with `--skip_rseqc` since it mainly focus on eukaryotic features like splice junctions, transcription start (TSS) and ending sites (TES)
87+
* If you aren't iterested in the biotypes of your RNA data, you can skip the whole process with `--skip_biotype_qc`
88+
89+
> **NB:** For older versions of the pipeline the names may be different. Check the paramters docs for details.
90+
91+
7892
## Running the pipeline
7993

8094
The typical command for running the pipeline is as follows:

0 commit comments

Comments
 (0)