Skip to content

oarfish reports only shortest isoform despite long-read support for longer isoforms #55

@FenFotop

Description

@FenFotop

Hi! I am running oarfish on single-cell long-read data generated with PacBio Kinnex. The dataset contains hematopoietic cells, which abundantly express the PTPRC (CD45) gene. PTPRC has well-known alternative isoforms, so we are treating it as a positive control.
In the alignment, we see some reads spanning the canonical isoform (ENST00000442510.8, that is expected to be expressed - see photo attached and table with reads from the minimap2 alignment). However, the oarfish output reports expression only for the shortest annotated isoform (ENST00000367364.5), with no counts assigned to the longer ones.
This is unexpected given both the known biology and the presence of long reads supporting longer isoforms.
Is this behavior expected? If so, are there any recommended parameter tweaks, annotation tricks, or general workarounds to help oarfish recover these longer isoforms? I like the tool and would prefer to adapt my setup rather than switch methods.
I provide the details of my minimap2 and oarfish runs. Thank you!!

Image Image

Minimap2

minimap2 -t 16 -ax splice:hq --secondary=no /path/dataset_assembly.mmi  /path/Lib_dedup.fasta \
        | samtools sort -@16 -o Lib_alignment.bam

Oarfish
oarfish --single-cell --alignments /path/Lib_alignment_namesorted.bam --output Lib_out --threads 8 --filter-group nanocount-filters --model-coverage --bin-width 100

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions