fix: prevent integer overflow and truncation in MergedPileup weighted mean#56
fix: prevent integer overflow and truncation in MergedPileup weighted mean#56
Conversation
23b6837 to
6f80ac3
Compare
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis pull request makes two changes: the Tools index documentation version identifier is updated, and the FilterAndMerge calculation logic is modified to perform intermediate multiplication using 64-bit integers before division and casting to Int. The arithmetic change affects both left and right position calculations, replacing direct integer division with a two-step approach that computes the sum as a Long before converting the final result to Int. 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala (1)
116-116:⚠️ Potential issue | 🟡 Minor
Math.round(_.toFloat)re-introduces precision loss for large genomic positions.
leftMean/rightMeanare alreadyIntafter the.toIntat lines 108–109. Converting them toFloat(24-bit mantissa, exact only up to 2²³ ≈ 8.4 M) before rounding silently corrupts positions on any chromosome beyond ~8 Mbp — including the chr1 ~200 M case motivating this PR.🐛 Proposed fix
- left_mean = Math.round(leftMean.toFloat), + left_mean = leftMean, ... - right_mean = Math.round(rightMean.toFloat), + right_mean = rightMean,Also applies to: 121-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala` at line 116, The code is converting already-truncated Ints back to Float and rounding (via Math.round(leftMean.toFloat / rightMean.toFloat)), which re-introduces precision loss for large genomic positions; update the assignments that set left_mean and right_mean in FilterAndMerge.scala to use the existing Int values (leftMean and rightMean) directly (or cast to Long/Int as required by the surrounding record type) instead of converting to Float and calling Math.round; apply the same change to the symmetric occurrence for right_mean so the stored positions remain exact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/main/scala/com/fulcrumgenomics/sv/tools/FilterAndMerge.scala`:
- Line 116: The code is converting already-truncated Ints back to Float and
rounding (via Math.round(leftMean.toFloat / rightMean.toFloat)), which
re-introduces precision loss for large genomic positions; update the assignments
that set left_mean and right_mean in FilterAndMerge.scala to use the existing
Int values (leftMean and rightMean) directly (or cast to Long/Int as required by
the surrounding record type) instead of converting to Float and calling
Math.round; apply the same change to the symmetric occurrence for right_mean so
the stored positions remain exact.
94ed9fd to
346bfc9
Compare
415c088 to
fda903e
Compare
… mean The weighted mean calculation `p.total * p.left_pos / total` had two bugs: 1. Integer overflow: `p.total * p.left_pos` is Int*Int which overflows for large genomic positions (up to ~250M) with moderate counts. 2. Per-term truncation: integer division inside map() truncates each term before summing, producing inaccurate results. E.g. three pileups with total=1 at pos=100 with overall total=3: each term becomes 1*100/3=33 (truncated), sum=99 instead of 100. Fix uses Long arithmetic for the product and performs a single division at the end.
1eca25e to
21111db
Compare
Summary
MergedPileupweighted mean calculation wherep.total * p.left_pos(Int*Int) can overflow for large genomic positionsmap()truncates each term before summing, producing inaccurate mean positionsTest plan
./mill tools.test)