-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description of the bug
I'm running a simple analysis on 4 samples (2 case replicates and 2 ctr replicates), just the DESeq2 part. When checking the results, the produced .txt file along with the plots show the geneIDS instead of Gene names. On the contrary, when I run the test pipeline, I get the gene names. I couldn't find any argument specifying this detail, but I'm not completely sure.
After checking a bit more, I find that software version files are pretty much identical. I'm suspecting that the root of this is the structure of tx2gene.tsv file output by the nf-core/rnaseq pipeline: while in the test_dataset folder (https://github.com/lconde-ucl/DGE2/blob/1c8d3da5ee82c04d121c6a8d78c4367c4d9e76de/assets/test_datasets/results_rnaseq/star_salmon/tx2gene.tsv) it looks like this:
rna0 DDX11L1 DDX11L1
rna1 WASH7P WASH7P
...
while mine looks different:
transcript_id gene_id gene_name
ENST00000511072 ENSG00000142611 PRDM16
ENST00000607632 ENSG00000142611 PRDM16
...
I'm guessing that when DESeq2 is run in R, it picks the second collumn instead of the third one, that's why I get the gene_ids instead of gene_names as output in the final results text file.
/
Command used and terminal output
nextflow run -bg lconde-ucl/DGE2 -profile docker -params-file params.yamlRelevant files
System information
CREATE_PARAM_FILE:
bash: 5.0.17 3
CUSTOM_DUMPSOFTWAREVERSIONS:
python: 3.11.7
yaml: 5.4.1
DESEQ2:
deseq2 version: 1.42.0
r_version: R version 4.3.2 (2023-10-31)
REPORT_DESIGN:
ComplexHeatmap version: 2.18.0
DESeq2 version: 1.42.0
ReportingTools version: 2.42.2
ashr version: 2.2.63
ggplot2 version: 3.4.4
hwriter version: 1.3.2.1
r_version: R version 4.3.2 (2023-10-31)
Workflow:
Nextflow: 24.10.5
lconde-ucl/DGE2: '1.0'