-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi! I am running oarfish on single-cell long-read data generated with PacBio Kinnex. The dataset contains hematopoietic cells, which abundantly express the PTPRC (CD45) gene. PTPRC has well-known alternative isoforms, so we are treating it as a positive control.
In the alignment, we see some reads spanning the canonical isoform (ENST00000442510.8, that is expected to be expressed - see photo attached and table with reads from the minimap2 alignment). However, the oarfish output reports expression only for the shortest annotated isoform (ENST00000367364.5), with no counts assigned to the longer ones.
This is unexpected given both the known biology and the presence of long reads supporting longer isoforms.
Is this behavior expected? If so, are there any recommended parameter tweaks, annotation tricks, or general workarounds to help oarfish recover these longer isoforms? I like the tool and would prefer to adapt my setup rather than switch methods.
I provide the details of my minimap2 and oarfish runs. Thank you!!
Minimap2
minimap2 -t 16 -ax splice:hq --secondary=no /path/dataset_assembly.mmi /path/Lib_dedup.fasta \
| samtools sort -@16 -o Lib_alignment.bam
Oarfish
oarfish --single-cell --alignments /path/Lib_alignment_namesorted.bam --output Lib_out --threads 8 --filter-group nanocount-filters --model-coverage --bin-width 100