Skip to content

fix: prevent integer overflow and truncation in MergedPileup weighted mean#56

Merged
nh13 merged 2 commits intomainfrom
fix/merged-pileup-integer-overflow
Mar 2, 2026
Merged

fix: prevent integer overflow and truncation in MergedPileup weighted mean#56
nh13 merged 2 commits intomainfrom
fix/merged-pileup-integer-overflow

Conversation

@nh13
Copy link
Member

@nh13 nh13 commented Feb 20, 2026

Summary

  • Fix integer overflow in MergedPileup weighted mean calculation where p.total * p.left_pos (Int*Int) can overflow for large genomic positions
  • Fix per-term truncation where integer division inside map() truncates each term before summing, producing inaccurate mean positions
  • Use Long arithmetic for the intermediate product and perform a single division at the end

Test plan

  • Existing tests pass (./mill tools.test)
  • Verify with large genomic positions (e.g. chr1 at ~200M) that mean is calculated correctly

@nh13 nh13 requested review from clintval and tfenne as code owners February 20, 2026 21:48
@nh13 nh13 force-pushed the fix/merged-pileup-integer-overflow branch from 23b6837 to 6f80ac3 Compare February 21, 2026 01:54
@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Warning

Rate limit exceeded

@github-actions[bot] has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This pull request makes two changes: the Tools index documentation version identifier is updated, and the FilterAndMerge calculation logic is modified to perform intermediate multiplication using 64-bit integers before division and casting to Int. The arithmetic change affects both left and right position calculations, replacing direct integer division with a two-step approach that computes the sum as a Long before converting the final result to Int.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and precisely describes the main change: fixing integer overflow and truncation in MergedPileup weighted mean calculation.
Description check ✅ Passed The description clearly explains the two bugs being fixed, the solution approach, and provides a test plan aligned with the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/merged-pileup-integer-overflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nh13
Copy link
Member Author

nh13 commented Feb 21, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala (1)

116-116: ⚠️ Potential issue | 🟡 Minor

Math.round(_.toFloat) re-introduces precision loss for large genomic positions.

leftMean / rightMean are already Int after the .toInt at lines 108–109. Converting them to Float (24-bit mantissa, exact only up to 2²³ ≈ 8.4 M) before rounding silently corrupts positions on any chromosome beyond ~8 Mbp — including the chr1 ~200 M case motivating this PR.

🐛 Proposed fix
-      left_mean      = Math.round(leftMean.toFloat),
+      left_mean      = leftMean,
...
-      right_mean     = Math.round(rightMean.toFloat),
+      right_mean     = rightMean,

Also applies to: 121-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala` at line
116, The code is converting already-truncated Ints back to Float and rounding
(via Math.round(leftMean.toFloat / rightMean.toFloat)), which re-introduces
precision loss for large genomic positions; update the assignments that set
left_mean and right_mean in FilterAndMerge.scala to use the existing Int values
(leftMean and rightMean) directly (or cast to Long/Int as required by the
surrounding record type) instead of converting to Float and calling Math.round;
apply the same change to the symmetric occurrence for right_mean so the stored
positions remain exact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala`:
- Line 116: The code is converting already-truncated Ints back to Float and
rounding (via Math.round(leftMean.toFloat / rightMean.toFloat)), which
re-introduces precision loss for large genomic positions; update the assignments
that set left_mean and right_mean in FilterAndMerge.scala to use the existing
Int values (leftMean and rightMean) directly (or cast to Long/Int as required by
the surrounding record type) instead of converting to Float and calling
Math.round; apply the same change to the symmetric occurrence for right_mean so
the stored positions remain exact.

@nh13 nh13 force-pushed the fix/merged-pileup-integer-overflow branch from 94ed9fd to 346bfc9 Compare February 21, 2026 05:57
@nh13 nh13 force-pushed the fix/merged-pileup-integer-overflow branch 3 times, most recently from 415c088 to fda903e Compare February 21, 2026 06:04
… mean

The weighted mean calculation `p.total * p.left_pos / total` had two bugs:

1. Integer overflow: `p.total * p.left_pos` is Int*Int which overflows
   for large genomic positions (up to ~250M) with moderate counts.

2. Per-term truncation: integer division inside map() truncates each
   term before summing, producing inaccurate results. E.g. three
   pileups with total=1 at pos=100 with overall total=3: each term
   becomes 1*100/3=33 (truncated), sum=99 instead of 100.

Fix uses Long arithmetic for the product and performs a single division
at the end.
@nh13 nh13 force-pushed the fix/merged-pileup-integer-overflow branch from 1eca25e to 21111db Compare February 21, 2026 06:20
@clintval clintval assigned nh13 and unassigned clintval Feb 26, 2026
@nh13 nh13 merged commit ca7b619 into main Mar 2, 2026
1 check passed
@nh13 nh13 deleted the fix/merged-pileup-integer-overflow branch March 2, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants