Skip to content

Iterator query over network stuck with CRAM on FTP #1877

@rick-heig

Description

@rick-heig

Hello,
I am accessing CRAM files over the network and sometimes sam_itr_querys gets stuck indefinitely (while still downloading data).

I have tested HTSLIB 1.16 and 1.21 (git checkout the tag) and get the same behaviour.

This may be related with issue : #604


I open my files the following way and iterate on regions with sam_itr_query() :

        htsFile *fp = hts_open(cram_file.c_str(), "r");
        if (!fp) {
            std::string error("Cannot open ");
            error += cram_file;
            throw DataCallerError(error);
        }
        hts_idx_t *idx = sam_index_load(fp, std::string(cram_file + ".crai").c_str());
        if (!idx) {
            throw DataCallerError(std::string("Failed to load index file"));
        }
        sam_hdr_t * hdrhdr = sam_hdr_read(fp);
        if (!hdr) {
            std::string error("Failed to read header from file ");
            error += cram_file;
            throw DataCallerError(error);
        }

        hts_itr_t *iter;
        while(...) { /* Iterate over many regions */
            if (iter) {
                 sam_itr_destroy(iter);
                 iter = NULL;
            }
            hts_itr_t *iter = sam_itr_querys(idx, hdr, region.c_str());
            ... do some work, e.g., pile up of reads ...
        }
       

Sometimes it works well and I can access the CRAM file data and sometimes it gets stuck and executes indefinitely. When I check network activity it downloads data continuously. If I rerun, normally the query returns quickly and downloads only little data.

When I interrupt my program I get the following backtrace :

  * frame #0: 0x00007ff80ba8dd1a libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010010430c phase_caller`wait_perform(fp=0x000000010124c340) at hfile_libcurl.c:729:17 [opt]
    frame #2: 0x0000000100105710 phase_caller`libcurl_read(fpv=0x000000010124c340, bufferv=0x0000000102809000, nbytes=<unavailable>) at hfile_libcurl.c:834:17 [opt]
    frame #3: 0x0000000100049d86 phase_caller`refill_buffer(fp=0x000000010124c340) at hfile.c:186:13 [opt]
    frame #4: 0x000000010004a0ee phase_caller`hread2(fp=<unavailable>, destv=0x0000700007d75960, nbytes=43, nread=65493) at hfile.c:339:23 [opt]
    frame #5: 0x00000001000ca179 phase_caller`cram_seek [inlined] hread(fp=0x000000010124c340, buffer=0x0000700007d75960, nbytes=65536) at hfile.h:244:56 [opt]
    frame #6: 0x00000001000ca127 phase_caller`cram_seek(fd=<unavailable>, offset=11493247130, whence=<unavailable>) at cram_io.c:5453:20 [opt]
    frame #7: 0x00000001000bea42 phase_caller`cram_seek_to_refpos(fd=0x00000001003af000, r=0x0000700007d85af8) at cram_index.c:583:22 [opt]
    frame #8: 0x00000001000cabd4 phase_caller`cram_set_voption(fd=0x00000001003af000, opt=<unavailable>, args=0x0000700007d85ac0) at cram_io.c:5815:17 [opt]
    frame #9: 0x00000001000ca789 phase_caller`cram_set_option(fd=<unavailable>, opt=<unavailable>) at cram_io.c:5703:9 [opt]
    frame #10: 0x0000000100063b94 phase_caller`cram_itr_query(idx=0x000000010124c9d0, tid=16, beg=<unavailable>, end=248678, readrec=<unavailable>) at sam.c:1696:19 [opt]
    frame #11: 0x0000000100057b9e phase_caller`hts_itr_querys(idx=0x000000010124c9d0, reg="chr17:248676-248678", getid=(phase_caller`bam_name2id at sam.h:780), hdr=0x000000010124ce30, itr_query=(phase_caller`cram_itr_query at sam.c:1681), readrec=<unavailable>) at hts.c:4161:12 [opt]
    frame #12: 0x0000000100063d21 phase_caller`sam_itr_querys(idx=<unavailable>, hdr=<unavailable>, region=<unavailable>) at sam.c:1757:12 [opt] [artificial]

I tested with the following CRAM file :

ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram.crai

I have managed to execute a few thousand of queries and sometimes after a few it gets stuck.

If you have any insights what to look for I can try some debugging.
Thanks.
Rick

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions