Skip to content

CSIv2 #148

@pd3

Description

@pd3

I created a new branch (9b11795) which supports reading and writing of the CSIv2 index (samtools/hts-specs@b131ffc). There is currently no API to use it. The motivation for the extension was to allow queries like:

  1. get N-th to M-th record
  2. create a list of regions with N records each (possibly with optional overlaps). This is useful in pipelines which split BCF/BAM in smaller chunks and process them in parallel

Shane suggested that the first type could be easily integrated with existing tools by using "::" instead of ":". Then chr::N-M would be interpreted as record indexes, while chr:from-to as genomic coordinates.

For the second, I was thinking of adding a new switch to tabix, one could do something like:
tabix --new-switch CHUNK_SIZE[,OVERLAP_SIZE] file.bcf [REGION]

What do you think? Comments and feedback is welcome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions