You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/usage.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,6 +77,20 @@ If you are using [GENCODE](https://www.gencodegenes.org/) reference genome files
77
77
- The `--gtf_group_features_type` parameter will automatically be set to `gene_type` as opposed to `gene_biotype`, respectively.
78
78
- If you are running Salmon, the `--gencode` flag will also be passed to the index building step to overcome parsing issues resulting from the transcript IDs in GENCODE fasta files being separated by vertical pipes (`|`) instead of spaces (see [this issue](https://github.com/COMBINE-lab/salmon/issues/15)).
79
79
80
+
## Adapt pipeline parameters for prokaryotes
81
+
82
+
The default settings of the pipeline are mainly adapted for eukaryotes but have to be changed slightly for prokaryotes. The main reason for this is the different genetic architecure of prokaryotes. The below mentioned parameters work if a `gff` file is provided as reference.
83
+
84
+
Changes and parameter specifications for prokaryotes:
85
+
86
+
* Use `--featurecounts_feature_type transcript` since the default value `exon` does not contain the required `--featurecounts_group_type gene_biotype` specification.
87
+
* You can use `--featurecounts_feature_type CDS` in combination with `--featurecoutns_group_type product` but than featureCounts will no longer reflect the biotypes of your RNA. It could be helpful to identify the number of hypothetical proteins.
88
+
* If your execution struggle with Salmon as aligner, change `--alginer` to hisat2.
89
+
*`--skip_rseqc` skip RSeQC since features like splice junctions, transcription start (TSS) and ending sites (TES) are less informative in prokaryotes than in eukaryotes.
90
+
*`--skip_biotype_qc` in case biotypes of your RNA data are of no interest.
91
+
92
+
> **NB:** For older versions of the pipeline the names may be different. Check the paramters docs for details.
93
+
80
94
## Running the pipeline
81
95
82
96
The typical command for running the pipeline is as follows:
0 commit comments