Skip to content

from_pileup_mate_aware very inefficient? #64

@brentp

Description

@brentp

Hi Seth, if I understand correctly, from_pileup_mate_aware is run for each column in a pileup. This means it is grouping, sorting, grouping, etc for each position, often with the same sets of reads for each consecutive column.

Do I misread? If not then this will be an extreme bottleneck as it's O(n^2) or worse.
We can mitigate by checking if the end of the left-most read is less than the start of the right-most read. Then at least the cost could be minimized to only the percent of reads that overlap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions