Skip to content

[fix][broker] Guard AsyncTokenBucket against long overflow#25262

Merged
lhotari merged 2 commits intoapache:masterfrom
codelipenghui:penghui/fix-async-token-bucket-overflow
Feb 26, 2026
Merged

[fix][broker] Guard AsyncTokenBucket against long overflow#25262
lhotari merged 2 commits intoapache:masterfrom
codelipenghui:penghui/fix-async-token-bucket-overflow

Conversation

@codelipenghui
Copy link
Contributor

@codelipenghui codelipenghui commented Feb 25, 2026

Motivation

The async dispatch rate limiter (AsyncTokenBucket) performs multiply-then-divide arithmetic using long. With large rates and longer rate periods (for example, 980,000,000 tokens per 10s), the multiplication can overflow and produce incorrect negative/intermediate values. In dispatch throttling this can keep available tokens at invalid values and effectively stall consumption.

Modifications

  • Added overflow-safe floor multiply/divide helper in AsyncTokenBucket:
    • Fast path for non-overflowing long multiplication.
    • Fallback to BigInteger for overflow cases.
    • Clamp large results to Long.MAX_VALUE.
  • Replaced overflow-prone arithmetic in token refill path:
    • calculateNewTokensSinceLastUpdate now uses safe multiply/divide.
    • Remainder-nanos back-calculation now uses safe multiply/divide as well.
  • Hardened throttling duration calculation:
    • calculateThrottlingDuration(long requiredTokens) now handles subtraction overflow and uses safe multiply/divide.
  • Added regression tests in AsyncTokenBucketTest:
    • shouldRefillTokensWithoutOverflowForLargeRateAnd10sPeriod
    • shouldCalculateThrottlingDurationWithoutOverflowForLargeNeedTokens

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Ran unit/regression tests:
    • ./mvnw -pl pulsar-broker -Dtest=AsyncTokenBucketTest -Dsurefire.failIfNoSpecifiedTests=false test
  • Ran dispatch-throttling integration tests:
    • ./mvnw -pl pulsar-broker -Dtest=SubscriptionMessageDispatchThrottlingTest -Dsurefire.failIfNoSpecifiedTests=false -DargLine='-Djdk.attach.allowAttachSelf=true -XX:+EnableDynamicAgentLoading' test

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: codelipenghui#26

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 25, 2026
@codelipenghui codelipenghui self-assigned this Feb 25, 2026
@codelipenghui codelipenghui added ready-to-test type/bug The PR fixed a bug or issue reported a bug labels Feb 25, 2026
@codelipenghui codelipenghui added this to the 4.2.0 milestone Feb 25, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.57%. Comparing base (82846e5) to head (0ae2ae0).

Files with missing lines Patch % Lines
...org/apache/pulsar/broker/qos/AsyncTokenBucket.java 62.50% 5 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25262      +/-   ##
============================================
- Coverage     72.73%   72.57%   -0.17%     
- Complexity    34047    34408     +361     
============================================
  Files          1959     1959              
  Lines        155418   155438      +20     
  Branches      17731    17738       +7     
============================================
- Hits         113050   112803     -247     
- Misses        33378    33634     +256     
- Partials       8990     9001      +11     
Flag Coverage Δ
inttests 25.63% <0.00%> (-0.54%) ⬇️
systests 22.26% <0.00%> (-0.21%) ⬇️
unittests 73.55% <62.50%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...org/apache/pulsar/broker/qos/AsyncTokenBucket.java 82.14% <62.50%> (-8.49%) ⬇️

... and 102 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. There were also issues #23930 and #24012 which have been fixed in the past, fixed since 4.0.3/3.3.5.

@lhotari lhotari dismissed their stale review February 26, 2026 08:27

I'll check out the PR and take a closer look.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this @codelipenghui!
I pushed a change to optimize safeMulDivFloor by using Math.multiplyHigh and Long.divideUnsigned. I also modified the test to cover more large rate values.

@lhotari lhotari merged commit d207d9c into apache:master Feb 26, 2026
53 checks passed
lhotari added a commit that referenced this pull request Feb 26, 2026
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit d207d9c)
lhotari added a commit that referenced this pull request Feb 26, 2026
Co-authored-by: Lari Hotari <lhotari@apache.org>
(cherry picked from commit d207d9c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants