Switch from strtol to hts_str2uint in mod parsing by cjw85 · Pull Request #1957 · samtools/htslib

cjw85 · 2025-09-30T12:50:54Z

This saves around 10-15% in programs calling the modified base APIs.

jmarshall · 2025-09-30T22:08:58Z

Looks like a good idea.

If you lift the declarations of tmp (to the char *cp declaration) and of (as, say, failed, to the top) these can stay as one-liners, which would be nice (clearer).

daviesrob · 2025-10-01T09:21:44Z

I'd agree that lofting particularly of could be useful. It could even be tested at the end to detect overflows, which isn't done at the moment.

It would also be a good idea to change the strtol() calls in bam_parse_basemod2(). bam_mods_at_next_pos and bam_parse_basemod2 rely somewhat on the strings being parsed in the same way, so I'd be a bit worried about subtle differences between strtol() and hts_str2uint() resulting in odd things happening if the input is not strictly spec-compliant.

jmarshall · 2025-10-01T19:40:49Z

sam_mods.c

-            if (cp != state->MM[i])
-                state->MMcount[i] = strtol(cp+1, NULL, 10);
-            else
+            if (cp != state->MM[i]) {


And now there's no need to modify a bunch of surrounding lines by adding { … }.

cjw85 · 2025-10-02T10:59:30Z

sam_mods.c

-                    : 0;
-                if (!cp_end) {
-                    // empty list
+                if (*cp == ',') {


@jkbonfield This site made me think for a minute; the interplay of the ternary and the if(!cp_end) worried me. I believe I have the logic correct, it reads a little more direct now as a consequence.

This highlights a bug in the old code infact.

Earlier on we have:

char *ms = cp, *me; // mod code start and end char *cp_end = NULL; int chebi = 0; if (isdigit_c(*cp)) { chebi = strtol(cp, &cp_end, 10); cp = cp_end; ms = cp-1; } else { while (*cp && isalpha_c(*cp)) cp++; if (*cp == '\0') return -1; }

However cp_end is non-NULL if we have a CHEBI code, and NULL otherwise.

Moving on to the code you commented on above: if we have no comma, and hence an empty list, then we're now affected by the CHEBI vs otherwise logic. Indeed this test file shows a problem:

$ cat /tmp/MM-chebi.sam * 0 * 0 0 * * 0 0 ACGCT * Mm:Z:C+m;C+76792;N+n; $ ./test/test_mod /tmp/MM-chebi.sam 2>/dev/null 0 A 1 C C+(76792). 2 G 3 C 4 T --- Present: m. #-76792. n. 1 C C+(76792).

If I put a cp_end = NULL after the above code block so it's always NULL regardless of code-vs-CHEBI then the rogue mod vanishes.

The revised code side-steps the issue and I agree it has easier to understand logic. A good spot.

jkbonfield · 2025-10-06T15:22:48Z

Thank you.

Switch from strtol to hts_str2uint in mod parsing

0a2aea8

Replace more strtols in sam_mods.c

9a2b846

cjw85 force-pushed the mod-uint-conv branch from 22883a9 to 9a2b846 Compare October 1, 2025 15:09

cjw85 added 2 commits October 1, 2025 16:12

Keep one-liner flavour to things

cc79be2

Undo update to submodule

cdff1de

jmarshall reviewed Oct 1, 2025

View reviewed changes

daviesrob assigned jkbonfield Oct 2, 2025

Remove some extraneous braces

343e689

cjw85 commented Oct 2, 2025

View reviewed changes

jkbonfield merged commit 72422ef into samtools:develop Oct 6, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from strtol to hts_str2uint in mod parsing#1957

Switch from strtol to hts_str2uint in mod parsing#1957
jkbonfield merged 5 commits intosamtools:developfrom
cjw85:mod-uint-conv

cjw85 commented Sep 30, 2025

Uh oh!

jmarshall commented Sep 30, 2025

Uh oh!

daviesrob commented Oct 1, 2025

Uh oh!

jmarshall Oct 1, 2025

Uh oh!

cjw85 Oct 2, 2025

Uh oh!

cjw85 Oct 2, 2025

Uh oh!

jkbonfield Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

jkbonfield commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cjw85 commented Sep 30, 2025

Uh oh!

jmarshall commented Sep 30, 2025

Uh oh!

daviesrob commented Oct 1, 2025

Uh oh!

jmarshall Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

cjw85 Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

cjw85 Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jkbonfield Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jkbonfield commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jkbonfield Oct 6, 2025 •

edited

Loading