-
Notifications
You must be signed in to change notification settings - Fork 462
Description
Hello,
I am accessing CRAM files over the network and sometimes sam_itr_querys gets stuck indefinitely (while still downloading data).
I have tested HTSLIB 1.16 and 1.21 (git checkout the tag) and get the same behaviour.
This may be related with issue : #604
I open my files the following way and iterate on regions with sam_itr_query() :
htsFile *fp = hts_open(cram_file.c_str(), "r");
if (!fp) {
std::string error("Cannot open ");
error += cram_file;
throw DataCallerError(error);
}
hts_idx_t *idx = sam_index_load(fp, std::string(cram_file + ".crai").c_str());
if (!idx) {
throw DataCallerError(std::string("Failed to load index file"));
}
sam_hdr_t * hdrhdr = sam_hdr_read(fp);
if (!hdr) {
std::string error("Failed to read header from file ");
error += cram_file;
throw DataCallerError(error);
}
hts_itr_t *iter;
while(...) { /* Iterate over many regions */
if (iter) {
sam_itr_destroy(iter);
iter = NULL;
}
hts_itr_t *iter = sam_itr_querys(idx, hdr, region.c_str());
... do some work, e.g., pile up of reads ...
}
Sometimes it works well and I can access the CRAM file data and sometimes it gets stuck and executes indefinitely. When I check network activity it downloads data continuously. If I rerun, normally the query returns quickly and downloads only little data.
When I interrupt my program I get the following backtrace :
* frame #0: 0x00007ff80ba8dd1a libsystem_kernel.dylib`__select + 10
frame #1: 0x000000010010430c phase_caller`wait_perform(fp=0x000000010124c340) at hfile_libcurl.c:729:17 [opt]
frame #2: 0x0000000100105710 phase_caller`libcurl_read(fpv=0x000000010124c340, bufferv=0x0000000102809000, nbytes=<unavailable>) at hfile_libcurl.c:834:17 [opt]
frame #3: 0x0000000100049d86 phase_caller`refill_buffer(fp=0x000000010124c340) at hfile.c:186:13 [opt]
frame #4: 0x000000010004a0ee phase_caller`hread2(fp=<unavailable>, destv=0x0000700007d75960, nbytes=43, nread=65493) at hfile.c:339:23 [opt]
frame #5: 0x00000001000ca179 phase_caller`cram_seek [inlined] hread(fp=0x000000010124c340, buffer=0x0000700007d75960, nbytes=65536) at hfile.h:244:56 [opt]
frame #6: 0x00000001000ca127 phase_caller`cram_seek(fd=<unavailable>, offset=11493247130, whence=<unavailable>) at cram_io.c:5453:20 [opt]
frame #7: 0x00000001000bea42 phase_caller`cram_seek_to_refpos(fd=0x00000001003af000, r=0x0000700007d85af8) at cram_index.c:583:22 [opt]
frame #8: 0x00000001000cabd4 phase_caller`cram_set_voption(fd=0x00000001003af000, opt=<unavailable>, args=0x0000700007d85ac0) at cram_io.c:5815:17 [opt]
frame #9: 0x00000001000ca789 phase_caller`cram_set_option(fd=<unavailable>, opt=<unavailable>) at cram_io.c:5703:9 [opt]
frame #10: 0x0000000100063b94 phase_caller`cram_itr_query(idx=0x000000010124c9d0, tid=16, beg=<unavailable>, end=248678, readrec=<unavailable>) at sam.c:1696:19 [opt]
frame #11: 0x0000000100057b9e phase_caller`hts_itr_querys(idx=0x000000010124c9d0, reg="chr17:248676-248678", getid=(phase_caller`bam_name2id at sam.h:780), hdr=0x000000010124ce30, itr_query=(phase_caller`cram_itr_query at sam.c:1681), readrec=<unavailable>) at hts.c:4161:12 [opt]
frame #12: 0x0000000100063d21 phase_caller`sam_itr_querys(idx=<unavailable>, hdr=<unavailable>, region=<unavailable>) at sam.c:1757:12 [opt] [artificial]
I tested with the following CRAM file :
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239334/NA12878.final.cram.crai
I have managed to execute a few thousand of queries and sometimes after a few it gets stuck.
If you have any insights what to look for I can try some debugging.
Thanks.
Rick