Skip to content

Pass elapsed time to get_timeout() to enable cumulative retry schedules #121

@matthiasgoergens

Description

@matthiasgoergens

Currently get_timeout(attempt, response) receives the attempt number and the response, but has no way to know how much wall-clock time has elapsed since the retry chain started. This makes it impossible to implement cumulative retry schedules where jitter naturally mean-reverts across attempts.

Motivation: mean-reverting jitter

JitterRetry adds random.uniform(0, ris) ** factor to each attempt's timeout independently. Over multiple retries, these jitter values compound — the total wait time drifts upward from the deterministic exponential schedule, with variance growing per attempt.

With access to elapsed time, a retry strategy could compute:

def get_timeout(self, attempt, response=None, start_time=None):
    if start_time is None:
        return super().get_timeout(attempt, response)

    # Cumulative target: what the deterministic schedule says total wait should be by now
    # (A closed-form geometric series exists but isn't worth the complexity for ~3-5 attempts)
    cumulative_target = sum(
        min(self._start_timeout * self._factor**i, self._max_timeout)
        for i in range(attempt)
    )
    elapsed = time.monotonic() - start_time
    remaining = max(0, cumulative_target - elapsed)
    jitter = random.uniform(0, self._random_interval_size) ** self._factor
    return min(remaining + jitter, self._max_timeout)

If a previous retry slept longer due to jitter, remaining is smaller on the next attempt, so the next sleep is shorter — the jitter mean-reverts. The total retry chain duration stays close to the deterministic schedule regardless of jitter variance.

This isn't possible today because get_timeout can't see elapsed time, and _RequestContext._do_request() doesn't expose its timing to the retry strategy.

Proposed change

Add start_time to get_timeout():

# RetryOptionsBase
def get_timeout(
    self,
    attempt: int,
    response: ClientResponse | None = None,
    start_time: float | None = None,
) -> float:
    ...

And in _RequestContext._do_request(), record start_time before the loop and pass it on each call:

async def _do_request(self) -> ClientResponse:
    current_attempt = 0
    start_time = time.monotonic()

    while True:
        ...
        retry_wait = self._retry_options.get_timeout(
            attempt=current_attempt, response=response, start_time=start_time,
        )

The default start_time=None means existing get_timeout implementations that ignore the parameter continue to work without changes — same pattern as the response parameter added in v2.5.6.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions