Switch from strtol to hts_str2uint in mod parsing#1957
Switch from strtol to hts_str2uint in mod parsing#1957jkbonfield merged 5 commits intosamtools:developfrom
Conversation
|
Looks like a good idea. If you lift the declarations of |
|
I'd agree that lofting particularly It would also be a good idea to change the |
sam_mods.c
Outdated
| if (cp != state->MM[i]) | ||
| state->MMcount[i] = strtol(cp+1, NULL, 10); | ||
| else | ||
| if (cp != state->MM[i]) { |
There was a problem hiding this comment.
And now there's no need to modify a bunch of surrounding lines by adding { … }.
| : 0; | ||
| if (!cp_end) { | ||
| // empty list | ||
| if (*cp == ',') { |
There was a problem hiding this comment.
@jkbonfield This site made me think for a minute; the interplay of the ternary and the if(!cp_end) worried me. I believe I have the logic correct, it reads a little more direct now as a consequence.
There was a problem hiding this comment.
This highlights a bug in the old code infact.
Earlier on we have:
char *ms = cp, *me; // mod code start and end
char *cp_end = NULL;
int chebi = 0;
if (isdigit_c(*cp)) {
chebi = strtol(cp, &cp_end, 10);
cp = cp_end;
ms = cp-1;
} else {
while (*cp && isalpha_c(*cp))
cp++;
if (*cp == '\0')
return -1;
}
However cp_end is non-NULL if we have a CHEBI code, and NULL otherwise.
Moving on to the code you commented on above: if we have no comma, and hence an empty list, then we're now affected by the CHEBI vs otherwise logic. Indeed this test file shows a problem:
$ cat /tmp/MM-chebi.sam
* 0 * 0 0 * * 0 0 ACGCT * Mm:Z:C+m;C+76792;N+n;
$ ./test/test_mod /tmp/MM-chebi.sam 2>/dev/null
0 A
1 C C+(76792).
2 G
3 C
4 T
---
Present: m. #-76792. n.
1 C C+(76792).
If I put a cp_end = NULL after the above code block so it's always NULL regardless of code-vs-CHEBI then the rogue mod vanishes.
The revised code side-steps the issue and I agree it has easier to understand logic. A good spot.
|
Thank you. |
This saves around 10-15% in programs calling the modified base APIs.