fix: improve backoff retry mechanism #859

erayack · 2025-10-16T12:33:03Z

Summary

This PR fixes issue #599 by improving the backoff retry mechanism in the TAP agent. The issue described that when a sender is denied, it aggregates too slowly since RAV requests are only triggered by UpdateReceiptFees messages after a fixed retry_interval (~30 secs), making the aggregation process too slow when the aggregator is UP.

Problem

As described in issue #599:

When a sender is denied, it aggregates too slowly
RAV requests are only triggered by UpdateReceiptFees messages
Those messages are only sent when the RAV request finishes or after retry_interval (~30 secs)
This makes aggregation too slow when the aggregator is UP but the tap-agent was down for a while

Solution

Use backoff information to retry aggregating instead of a fixed retry_interval (as suggested in the issue):

Backoff System Improvements

Refactored BackoffInfo: Replaced in_backoff() with remaining() method for more precise timing control
Added min_remaining_backoff(): New method in SenderFeeTracker to get the minimum remaining backoff across all allocations
Dynamic retry scheduling: Implemented schedule_retry() method that considers both global and allocation-specific backoff states
Removed hardcoded intervals: Eliminated retry_interval from SenderAccountArgs in favor of dynamic backoff-based scheduling

Testing & Quality

Unit tests: Added retry_helpers_tests module with tests for:
- BackoffInfo::remaining() reset behavior after successful operations
- SenderFeeTracker::min_remaining_backoff() roundtrip functionality
Fixed syntax error: Resolved missing closing brace in tracker.rs
Code formatting: Applied rustfmt standards to all modified files

Technical Details

The retry mechanism now works as follows:

When a RAV request fails, both global (BackoffInfo) and allocation-specific (SenderFeeTracker) backoff states are updated
The next_retry_delay() method calculates the maximum of global and allocation-specific remaining backoff times
Retries are scheduled dynamically based on actual backoff state rather than fixed intervals
This ensures more efficient retry behavior and prevents unnecessary rapid retries

Testing

The new functionality has been tested with standalone tests that verify:

Exponential backoff calculation (100ms → 200ms → 400ms, etc.)
60-second backoff cap enforcement
Proper reset behavior after successful operations
Integration between global and allocation-specific backoff states

Files Modified

crates/tap-agent/src/agent/sender_account.rs - Main retry logic improvements
crates/tap-agent/src/agent/sender_accounts_manager.rs - Removed hardcoded retry interval
crates/tap-agent/src/backoff.rs - Enhanced backoff information tracking
crates/tap-agent/src/test.rs - Added comprehensive unit tests
crates/tap-agent/src/tracker.rs - Added min_remaining_backoff method and fixed syntax

Fixes

Closes #599

- Refactor BackoffInfo to use remaining() method instead of in_backoff() - Add min_remaining_backoff() method to SenderFeeTracker - Implement dynamic retry scheduling based on backoff state - Remove hardcoded retry_interval from SenderAccountArgs - Add comprehensive unit tests for backoff functionality - Fix syntax error in tracker.rs (missing closing brace) - Improve code formatting and documentation This addresses issues with the retry mechanism by making it more adaptive and removing hardcoded retry intervals in favor of dynamic backoff-based scheduling.

erayack changed the title ~~fix: improve backoff retry mechanism and add comprehensive tests~~ fix: improve backoff retry mechanism Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve backoff retry mechanism #859

fix: improve backoff retry mechanism #859

Uh oh!

erayack commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: improve backoff retry mechanism #859

Are you sure you want to change the base?

fix: improve backoff retry mechanism #859

Uh oh!

Conversation

erayack commented Oct 16, 2025

Summary

Problem

Solution

Backoff System Improvements

Testing & Quality

Technical Details

Testing

Files Modified

Fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant