-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix: prevent int64 overflow in retry backoff calculation #15277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For high retry counts with exponential backoff, the calculation baseDuration * factor^retryNumber can overflow int64 (time.Duration). Example: 5s base * 2^31 factor exceeds MaxInt64 nanoseconds. This fix checks if the multiplication would overflow before performing it, and caps at math.MaxInt64 (max time.Duration) if so. Signed-off-by: Andre Kurait <[email protected]>
02910d2 to
b6dc454
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe pull request adds integer overflow protection to the exponential backoff calculation in the retry logic. When computing backoff duration with a factor, the code now checks if the result would exceed the maximum int64 value and clamps it to prevent negative duration overflows that caused retries to skip backoff waits. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Joibel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix
Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
|
🍒 Cherry-pick PR created for 3.6: #15290 |
|
🍒 Cherry-pick PR created for 3.7: #15291 |
…15277 for 3.7) (#15291) Signed-off-by: Andre Kurait <[email protected]> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Andre Kurait <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
…15277 for 3.6) (#15290) Signed-off-by: Andre Kurait <[email protected]> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Andre Kurait <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
Update argo-workflows chart version from 0.45.24 to 0.47.1 and override images.tag to v3.7.9 to incorporate the retry delay overflow fix from argoproj/argo-workflows#15277 Signed-off-by: Andre Kurait <[email protected]>
Update argo-workflows chart version from 0.45.24 to 0.47.1 and override images.tag to v3.7.9 to incorporate the retry delay overflow fix from argoproj/argo-workflows#15277 Signed-off-by: Andre Kurait <[email protected]>
Signed-off-by: Andre Kurait <[email protected]>
|
🍒 Cherry-pick PR created for 4.0: #15508 |
…15277 for 4.0) (#15508) Signed-off-by: Andre Kurait <[email protected]> Co-authored-by: Andre Kurait <[email protected]>
Fixes #15276
Motivation
When using exponential backoff with high retry counts, the calculation
baseDuration * factor^retryNumbercan overflow Go'sint64(which backstime.Duration), causing:if timeToWait > capDurationfails (negative < positive)Overflow occurs at these retry counts (with 1s base duration):
Modifications
factor > MaxInt64/baseDurationbefore multiplication to detect potential overflowtimeToWaitatmath.MaxInt64when overflow would occurVerification
Unit test: Added
TestProcessNodeRetriesBackoffOverflowthat simulates 32 retries with factor=2, duration=5sManual E2E testing: Deployed fixed controller to minikube and ran test workflow with
duration=1s,factor=100,cap=30s:Before fix - retry intervals at overflow point:
After fix - all intervals consistent:
Controller logs confirm proper backoff message for all retries including overflow point:
Documentation
No documentation changes needed - this is a bug fix for an edge case.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.