-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Description
When querying an invalid chromosome which is not present in the RAM file (e.g., chrXYZ) using ramntupleview , GetRefId() is used to get the integer mapping of the reference string. If an unknown reference string is passed, it gets inserted into the internal reference vector fRefVec.
This behaviour is useful when scanning a SAM file during conversion to build up the fRefVec vector. However, read-only querying should not modify this vector.
Root Cause
Used to get the refid:
ramtools/src/ramcore/RAMNTupleView.cxx
Lines 37 to 38 in 83334a1
| auto refid = RAMNTupleRecord::GetRnameRefs()->GetRefId(rname.Data()); |
Gets inserted if not found in current indexes:
ramtools/src/rntuple/RAMNTupleRecord.cxx
Lines 45 to 66 in 83334a1
| int RAMNTupleRefs::GetRefId(const std::string &rname) | |
| { | |
| if (rname == "*") { | |
| return -1; | |
| } | |
| if (rname == fLastName) { | |
| return fLastId; | |
| } | |
| auto it = std::find(fRefVec.begin(), fRefVec.end(), rname); | |
| if (it != fRefVec.end()) { | |
| fLastId = static_cast<int>(std::distance(fRefVec.begin(), it)); | |
| fLastName = rname; | |
| return fLastId; | |
| } | |
| if (static_cast<int>(fRefVec.size()) >= static_cast<int>(fRefVec.capacity())) { | |
| fRefVec.reserve(fRefVec.capacity() * 2); | |
| } | |
| fRefVec.push_back(rname); |
Runtime Inspection:
gdb ./tools/ramntupleview
Reading symbols from ./tools/ramntupleview...
(gdb) set breakpoint pending on
(gdb) break RAMNTupleRefs::GetRefId
Function "RAMNTupleRefs::GetRefId" not defined.
Breakpoint 1 (RAMNTupleRefs::GetRefId) pending.
(gdb) run ./../../output.root "chrXYZ:100-200"
Starting program: /home/neo-0007/dev/Workspaces/Compiler-Research/ramtools/build/tools/ramntupleview ./../../output.root "chrXYZ:100-200"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5fff6c0 (LWP 1315048)]
[New Thread 0x7ffff57fe6c0 (LWP 1315049)]
[Thread 0x7ffff57fe6c0 (LWP 1315049) exited]
[New Thread 0x7ffff57fe6c0 (LWP 1315050)]
[Detaching after vfork from child process 1315051]
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libtinfo.so.6
[Detaching after vfork from child process 1315053]
[Detaching after vfork from child process 1315057]
[Thread 0x7ffff57fe6c0 (LWP 1315050) exited]
Thread 1 "ramntupleview" hit Breakpoint 1, RAMNTupleRefs::GetRefId (this=0x5555555e9320, rname="chrXYZ")
at /home/neo-0007/dev/Workspaces/Compiler-Research/ramtools/src/rntuple/RAMNTupleRecord.cxx:46
46 {
(gdb) print rname
$1 = "chrXYZ"
(gdb) print fRefVec
$2 = std::vector of length 24, capacity 100 = {"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrM", "chrX"}
(gdb) next
47 if (rname == "*") {
(gdb) next
51 if (rname == fLastName) {
(gdb) next
55 auto it = std::find(fRefVec.begin(), fRefVec.end(), rname);
(gdb) next
56 if (it != fRefVec.end()) {
(gdb) next
62 if (static_cast<int>(fRefVec.size()) >= static_cast<int>(fRefVec.capacity())) {
(gdb) next
66 fRefVec.push_back(rname);
(gdb) next
67 fLastId = static_cast<int>(fRefVec.size() - 1);
(gdb) print fRefVec
$3 = std::vector of length 25, capacity 100 = {"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrM", "chrX", "chrXYZ"}
(gdb) exit
So querying a invalid chromosome modifies the internal reference vector.
Expected Behaviour
Querying an invalid chromosome should:
- Not modify fRefVec
- Return an error like:
Error: Unknown reference sequence chrXYZ
Possible Solution
- Add a new function that does not modify state and use that during querying:
int FindRefId(const std::string &rname) const; - Additionally,
GetRefIdcan be renamed to something likeGetOrCreateRefIdfor clarity