Skip to content

[Bug] Querying an invalid chromosome using ramntupleview adds it to fRefVec #23

@neo-0007

Description

@neo-0007

Description

When querying an invalid chromosome which is not present in the RAM file (e.g., chrXYZ) using ramntupleview , GetRefId() is used to get the integer mapping of the reference string. If an unknown reference string is passed, it gets inserted into the internal reference vector fRefVec.
This behaviour is useful when scanning a SAM file during conversion to build up the fRefVec vector. However, read-only querying should not modify this vector.

Root Cause

Used to get the refid:

auto refid = RAMNTupleRecord::GetRnameRefs()->GetRefId(rname.Data());

Gets inserted if not found in current indexes:

int RAMNTupleRefs::GetRefId(const std::string &rname)
{
if (rname == "*") {
return -1;
}
if (rname == fLastName) {
return fLastId;
}
auto it = std::find(fRefVec.begin(), fRefVec.end(), rname);
if (it != fRefVec.end()) {
fLastId = static_cast<int>(std::distance(fRefVec.begin(), it));
fLastName = rname;
return fLastId;
}
if (static_cast<int>(fRefVec.size()) >= static_cast<int>(fRefVec.capacity())) {
fRefVec.reserve(fRefVec.capacity() * 2);
}
fRefVec.push_back(rname);

Runtime Inspection:

gdb ./tools/ramntupleview  

Reading symbols from ./tools/ramntupleview...  

(gdb) set breakpoint pending on  
(gdb) break RAMNTupleRefs::GetRefId 

Function "RAMNTupleRefs::GetRefId" not defined.  
Breakpoint 1 (RAMNTupleRefs::GetRefId) pending.   

(gdb) run ./../../output.root "chrXYZ:100-200"  

Starting program: /home/neo-0007/dev/Workspaces/Compiler-Research/ramtools/build/tools/ramntupleview ./../../output.root "chrXYZ:100-200"  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  
[New Thread 0x7ffff5fff6c0 (LWP 1315048)]  
[New Thread 0x7ffff57fe6c0 (LWP 1315049)]  
[Thread 0x7ffff57fe6c0 (LWP 1315049) exited]  
[New Thread 0x7ffff57fe6c0 (LWP 1315050)]  
[Detaching after vfork from child process 1315051]  
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libtinfo.so.6  
[Detaching after vfork from child process 1315053]  
[Detaching after vfork from child process 1315057]  
[Thread 0x7ffff57fe6c0 (LWP 1315050) exited]  
  
Thread 1 "ramntupleview" hit Breakpoint 1, RAMNTupleRefs::GetRefId (this=0x5555555e9320, rname="chrXYZ")  
   at /home/neo-0007/dev/Workspaces/Compiler-Research/ramtools/src/rntuple/RAMNTupleRecord.cxx:46  
46      {  
(gdb) print rname  
$1 = "chrXYZ"  
(gdb) print fRefVec  
$2 = std::vector of length 24, capacity 100 = {"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",    
 "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrM", "chrX"}  
(gdb) next  
47         if (rname == "*") {  
(gdb) next  
51         if (rname == fLastName) {  
(gdb) next  
55         auto it = std::find(fRefVec.begin(), fRefVec.end(), rname);  
(gdb) next  
56         if (it != fRefVec.end()) {  
(gdb) next  
62         if (static_cast<int>(fRefVec.size()) >= static_cast<int>(fRefVec.capacity())) {  
(gdb) next  
66         fRefVec.push_back(rname);  
(gdb) next  
67         fLastId = static_cast<int>(fRefVec.size() - 1);  
(gdb) print fRefVec  
$3 = std::vector of length 25, capacity 100 = {"chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",    
 "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrM", "chrX", "chrXYZ"}  
(gdb) exit 

So querying a invalid chromosome modifies the internal reference vector.

Expected Behaviour

Querying an invalid chromosome should:

  • Not modify fRefVec
  • Return an error like: Error: Unknown reference sequence chrXYZ

Possible Solution

  • Add a new function that does not modify state and use that during querying:
int FindRefId(const std::string &rname) const;  
  • Additionally, GetRefId can be renamed to something like GetOrCreateRefId for clarity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions