Skip to content

Commit 24475e9

Browse files
authored
add proposal, unit tests and implement feedback (#5)
1 parent c8e6f99 commit 24475e9

File tree

6 files changed

+553
-19
lines changed

6 files changed

+553
-19
lines changed

Evolution/NNNN-retry-backoff.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# Retry & Backoff
2+
3+
* Proposal: [NNNN](NNNN-retry-backoff.md)
4+
* Authors: [Philipp Gabriel](https://github.com/ph1ps)
5+
* Review Manager: TBD
6+
* Status: **Implemented**
7+
8+
## Introduction
9+
10+
This proposal introduces a `retry` function and a suite of backoff strategies for Swift Async Algorithms, enabling robust retries of failed asynchronous operations with customizable delays and error-driven decisions.
11+
12+
Swift forums thread: [Discussion thread topic for that proposal](https://forums.swift.org/)
13+
14+
## Motivation
15+
16+
Retry logic with backoff is a common requirement in asynchronous programming, especially for operations subject to transient failures such as network requests. Today, developers must reimplement retry loops manually, leading to fragmented and error-prone solutions across the ecosystem.
17+
18+
Providing a standard `retry` function and reusable backoff strategies in Swift Async Algorithms ensures consistent, safe and well-tested patterns for handling transient failures.
19+
20+
## Proposed solution
21+
22+
This proposal introduces a retry function that executes an asynchronous operation up to a specified number of attempts, with customizable delays and error-based retry decisions between attempts.
23+
24+
```swift
25+
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
26+
public func retry<Result, ErrorType, ClockType>(
27+
maxAttempts: Int,
28+
tolerance: ClockType.Instant.Duration? = nil,
29+
clock: ClockType = ContinuousClock(),
30+
isolation: isolated (any Actor)? = #isolation,
31+
operation: () async throws(ErrorType) -> Result,
32+
strategy: (ErrorType) -> RetryAction<ClockType.Instant.Duration> = { _ in .backoff(.zero) }
33+
) async throws -> Result where ClockType: Clock, ErrorType: Error
34+
```
35+
36+
```swift
37+
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
38+
public enum RetryAction<Duration: DurationProtocol> {
39+
case backoff(Duration)
40+
case stop
41+
}
42+
```
43+
44+
Additionally, this proposal includes a suite of backoff strategies that can be used to generate delays between retry attempts. The core strategies provide different patterns for calculating delays: constant intervals, linear growth, exponential growth, and decorrelated jitter.
45+
46+
```swift
47+
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
48+
public enum Backoff {
49+
public static func constant<Duration: DurationProtocol>(_ constant: Duration) -> some BackoffStrategy<Duration>
50+
public static func constant(_ constant: Duration) -> some BackoffStrategy<Duration>
51+
public static func linear<Duration: DurationProtocol>(increment: Duration, initial: Duration) -> some BackoffStrategy<Duration>
52+
public static func linear(increment: Duration, initial: Duration) -> some BackoffStrategy<Duration>
53+
public static func exponential<Duration: DurationProtocol>(factor: Int, initial: Duration) -> some BackoffStrategy<Duration>
54+
public static func exponential(factor: Int, initial: Duration) -> some BackoffStrategy<Duration>
55+
}
56+
@available(iOS 18.0, macCatalyst 18.0, macOS 15.0, tvOS 18.0, visionOS 2.0, watchOS 11.0, *)
57+
extension Backoff {
58+
public static func decorrelatedJitter<RNG: RandomNumberGenerator>(factor: Int, base: Duration, using generator: RNG = SystemRandomNumberGenerator()) -> some BackoffStrategy<Duration>
59+
}
60+
```
61+
62+
These strategies can be modified to enforce minimum or maximum delays, or to add jitter for preventing the thundering herd problem.
63+
64+
```swift
65+
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
66+
extension BackoffStrategy {
67+
public func minimum(_ minimum: Duration) -> some BackoffStrategy<Duration>
68+
public func maximum(_ maximum: Duration) -> some BackoffStrategy<Duration>
69+
}
70+
@available(iOS 18.0, macCatalyst 18.0, macOS 15.0, tvOS 18.0, visionOS 2.0, watchOS 11.0, *)
71+
extension BackoffStrategy where Duration == Swift.Duration {
72+
public func fullJitter<RNG: RandomNumberGenerator>(using generator: RNG = SystemRandomNumberGenerator()) -> some BackoffStrategy<Duration>
73+
public func equalJitter<RNG: RandomNumberGenerator>(using generator: RNG = SystemRandomNumberGenerator()) -> some BackoffStrategy<Duration>
74+
}
75+
```
76+
77+
Constant, linear, and exponential backoff provide overloads for both `Duration` and `DurationProtocol`. This matches the `retry` overloads where the default clock is `ContinuousClock` whose duration type is `Duration`.
78+
79+
Jitter variants currently require `Duration` rather than a generic `DurationProtocol`, because only `Duration` exposes a numeric representation suitable for randomization (see [SE-0457](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0457-duration-attosecond-represenation.md)).
80+
81+
Each of those strategies conforms to the `BackoffStrategy` protocol:
82+
83+
```swift
84+
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
85+
public protocol BackoffStrategy<Duration> {
86+
associatedtype Duration: DurationProtocol
87+
mutating func nextDuration() -> Duration
88+
}
89+
```
90+
91+
## Detailed design
92+
93+
### Retry
94+
95+
The retry algorithm follows this sequence:
96+
1. Execute the operation
97+
2. If successful, return the result
98+
3. If failed and this was not the final attempt:
99+
- Call the `strategy` closure with the error
100+
- If the strategy returns `.stop`, rethrow the error immediately
101+
- If the strategy returns `.backoff`, suspend for the given duration
102+
- Return to step 1
103+
4. If failed on the final attempt, rethrow the error without consulting the strategy
104+
105+
Given this sequence, there are four termination conditions (when retrying will be stopped):
106+
- The operation completes without throwing an error
107+
- The operation has been attempted `maxAttempts` times
108+
- The strategy closure returns `.stop`
109+
- The clock throws
110+
111+
#### Cancellation
112+
113+
`retry` does not introduce special cancellation handling. If your code cooperatively cancels by throwing, ensure your strategy returns `.stop` for that error. Otherwise, retries continue unless the clock throws on cancellation (which, at the time of writing, both `ContinuousClock` and `SuspendingClock` do).
114+
115+
### Backoff
116+
117+
All proposed strategies conform to `BackoffStrategy` which allows for builder-like syntax like this:
118+
```swift
119+
var backoff = Backoff
120+
.exponential(factor: 2, initial: .milliseconds(100))
121+
.maximum(.seconds(5))
122+
.fullJitter()
123+
```
124+
125+
#### Custom backoff
126+
127+
Adopters may choose to create their own strategies. There is no requirement to conform to `BackoffStrategy`, since retry and backoff are decoupled; however, to use the provided modifiers (`minimum`, `maximum`, `jitter`), a strategy must conform.
128+
129+
Each call to `nextDuration()` returns the delay for the next retry attempt. Strategies are naturally stateful. For instance, they may track the number of invocations or the previously returned duration to calculate the next delay.
130+
131+
#### Standard backoff
132+
133+
As previously mentioned this proposal introduces several common backoff strategies which include:
134+
135+
- **Constant**: $f(n) = constant$
136+
- **Linear**: $f(n) = initial + increment * n$
137+
- **Exponential**: $f(n) = initial * factor ^ n$
138+
- **Decorrelated Jitter**: $f(n) = random(base, f(n - 1) * factor)$ where $f(0) = base$
139+
- **Minimum**: $f(n) = max(minimum, g(n))$ where $g(n)$ is the base strategy
140+
- **Maximum**: $f(n) = min(maximum, g(n))$ where $g(n)$ is the base strategy
141+
- **Full Jitter**: $f(n) = random(0, g(n))$ where $g(n)$ is the base strategy
142+
- **Equal Jitter**: $f(n) = random(g(n) / 2, g(n))$ where $g(n)$ is the base strategy
143+
144+
##### Sendability
145+
146+
The proposed backoff strategies are not marked `Sendable`.
147+
They are not meant to be shared across isolation domains, because their state evolves with each call to `nextDuration()`.
148+
Re-creating the strategies when they are used in different domains is usually the correct approach.
149+
150+
### Case studies
151+
152+
The most common use cases encountered for recovering from transient failures are either:
153+
- a system requiring its user to come up with a reasonable duration to let the system cool off
154+
- a system providing its own duration which the user is supposed to honor to let the system cool off
155+
156+
Both of these use cases can be implemented using the proposed algorithm, respectively:
157+
158+
```swift
159+
let rng = SystemRandomNumberGenerator() // or a seeded RNG for unit tests
160+
var backoff = Backoff
161+
.exponential(factor: 2, initial: .milliseconds(100))
162+
.maximum(.seconds(10))
163+
.fullJitter(using: rng)
164+
165+
let response = try await retry(maxAttempts: 5) {
166+
try await URLSession.shared.data(from: url)
167+
} strategy: { error in
168+
return .backoff(backoff.nextDuration())
169+
}
170+
```
171+
172+
```swift
173+
let response = try await retry(maxAttempts: 5) {
174+
let (data, response) = try await URLSession.shared.data(from: url)
175+
if
176+
let response = response as? HTTPURLResponse,
177+
response.statusCode == 429,
178+
let retryAfter = response.value(forHTTPHeaderField: "Retry-After"),
179+
let seconds = Double(retryAfter)
180+
{
181+
throw TooManyRequestsError(retryAfter: seconds)
182+
}
183+
return (data, response)
184+
} strategy: { error in
185+
if let error = error as? TooManyRequestsError {
186+
return .backoff(.seconds(error.retryAfter))
187+
} else {
188+
return .stop
189+
}
190+
}
191+
```
192+
(For demonstration purposes only, a network server is used as the remote system.)
193+
194+
## Effect on API resilience
195+
196+
This proposal introduces a purely additive API with no impact on existing functionality or API resilience.
197+
198+
## Future directions
199+
200+
The jitter variants introduced by this proposal support custom `RandomNumberGenerator` by **copying** it in order to perform the necessary mutations.
201+
This is not optimal and does not match the standard library's signatures of e.g. `shuffle()` or `randomElement()` which take an **`inout`** random number generator.
202+
Due to the composability of backoff algorithms proposed here, this is not possible to adopt in current Swift.
203+
If Swift gains the capability to "store" `inout` variables, the jitter variants should adopt this by adding new `inout` overloads and deprecating the copying overloads.
204+
205+
## Alternatives considered
206+
207+
Another option considered was to pass the current attempt number into the `BackoffStrategy`.
208+
209+
Although this initially seems useful, it conflicts with the idea of strategies being stateful. A strategy is supposed to track its own progression (e.g. by counting invocations or storing the last duration). If the attempt number were provided externally, strategies would become "semi-stateful": mutating because of internal components such as a `RandomNumberGenerator`, but at the same time relying on an external counter instead of their own stored history. This dual model is harder to reason about and less consistent, so it was deliberately avoided.
210+
211+
If adopters require access to the attempt number, they are free to implement this themselves, since the strategy is invoked each time a failure occurs, making it straightforward to maintain an external attempt counter.
212+
213+
## Acknowledgments
214+
215+
Thanks to [Philippe Hausler](https://github.com/phausler), [Franz Busch](https://github.com/FranzBusch) and [Honza Dvorsky](https://github.com/czechboy0) for their thoughtful feedback and suggestions that helped refine the API design and improve its clarity and usability.

Sources/AsyncAlgorithms/Retry/Backoff.swift

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ public protocol BackoffStrategy<Duration> {
5050
@usableFromInline let factor: Int
5151
@usableFromInline init(factor: Int, initial: Duration) {
5252
precondition(initial >= .zero, "Initial must be greater than or equal to 0")
53-
precondition(factor >= .zero, "Factor must be greater than or equal to 0")
5453
self.current = initial
5554
self.factor = factor
5655
}
@@ -158,7 +157,7 @@ public enum Backoff {
158157

159158
@available(iOS 18.0, macCatalyst 18.0, macOS 15.0, tvOS 18.0, visionOS 2.0, watchOS 11.0, *)
160159
extension Backoff {
161-
@inlinable public static func decorrelatedJitter<RNG: RandomNumberGenerator>(factor: Int, base: Duration, using generator: RNG) -> some BackoffStrategy<Duration> {
160+
@inlinable public static func decorrelatedJitter<RNG: RandomNumberGenerator>(factor: Int, base: Duration, using generator: RNG = SystemRandomNumberGenerator()) -> some BackoffStrategy<Duration> {
162161
return DecorrelatedJitterBackoffStrategy(base: base, factor: factor, generator: generator)
163162
}
164163
}

Sources/AsyncAlgorithms/Retry/Retry.swift

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,39 @@
11
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
2-
public struct RetryStrategy<Duration: DurationProtocol> {
3-
@usableFromInline enum Strategy {
2+
public struct RetryAction<Duration: DurationProtocol> {
3+
@usableFromInline enum Action {
44
case backoff(Duration)
55
case stop
66
}
7-
@usableFromInline let strategy: Strategy
8-
@usableFromInline init(strategy: Strategy) {
9-
self.strategy = strategy
7+
@usableFromInline let action: Action
8+
@usableFromInline init(action: Action) {
9+
self.action = action
1010
}
1111
@inlinable public static var stop: Self {
12-
return .init(strategy: .stop)
12+
return .init(action: .stop)
1313
}
1414
@inlinable public static func backoff(_ duration: Duration) -> Self {
15-
return .init(strategy: .backoff(duration))
15+
return .init(action: .backoff(duration))
1616
}
1717
}
1818

1919
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
2020
@inlinable public func retry<Result, ErrorType, ClockType>(
21-
maxAttempts: Int = 3,
21+
maxAttempts: Int,
2222
tolerance: ClockType.Instant.Duration? = nil,
2323
clock: ClockType,
2424
isolation: isolated (any Actor)? = #isolation,
25-
operation: () async throws(ErrorType) -> sending Result,
26-
strategy: (ErrorType) -> RetryStrategy<ClockType.Instant.Duration> = { _ in .backoff(.zero) }
25+
operation: () async throws(ErrorType) -> Result,
26+
strategy: (ErrorType) -> RetryAction<ClockType.Instant.Duration> = { _ in .backoff(.zero) }
2727
) async throws -> Result where ClockType: Clock, ErrorType: Error {
2828
precondition(maxAttempts > 0, "Must have at least one attempt")
2929
for _ in 0..<maxAttempts - 1 {
3030
do {
3131
return try await operation()
32-
} catch where Task.isCancelled {
33-
throw error
3432
} catch {
35-
switch strategy(error).strategy {
33+
switch strategy(error).action {
3634
case .backoff(let duration):
37-
try await Task.sleep(for: duration, tolerance: tolerance, clock: clock)
35+
let deadline = clock.now.advanced(by: duration)
36+
try await Task.sleep(until: deadline, tolerance: tolerance, clock: clock)
3837
case .stop:
3938
throw error
4039
}
@@ -45,11 +44,11 @@ public struct RetryStrategy<Duration: DurationProtocol> {
4544

4645
@available(iOS 16.0, macCatalyst 16.0, macOS 13.0, tvOS 16.0, visionOS 1.0, watchOS 9.0, *)
4746
@inlinable public func retry<Result, ErrorType>(
48-
maxAttempts: Int = 3,
47+
maxAttempts: Int,
4948
tolerance: ContinuousClock.Instant.Duration? = nil,
5049
isolation: isolated (any Actor)? = #isolation,
51-
operation: () async throws(ErrorType) -> sending Result,
52-
strategy: (ErrorType) -> RetryStrategy<ContinuousClock.Instant.Duration> = { _ in .backoff(.zero) }
50+
operation: () async throws(ErrorType) -> Result,
51+
strategy: (ErrorType) -> RetryAction<ContinuousClock.Instant.Duration> = { _ in .backoff(.zero) }
5352
) async throws -> Result where ErrorType: Error {
5453
return try await retry(
5554
maxAttempts: maxAttempts,
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
// Taken from: https://github.com/swiftlang/swift/blob/main/benchmark/utils/TestsUtils.swift#L257-L271
2+
public struct SplitMix64: RandomNumberGenerator {
3+
private var state: UInt64
4+
5+
public init(seed: UInt64) {
6+
self.state = seed
7+
}
8+
9+
public mutating func next() -> UInt64 {
10+
self.state &+= 0x9e37_79b9_7f4a_7c15
11+
var z: UInt64 = self.state
12+
z = (z ^ (z &>> 30)) &* 0xbf58_476d_1ce4_e5b9
13+
z = (z ^ (z &>> 27)) &* 0x94d0_49bb_1331_11eb
14+
return z ^ (z &>> 31)
15+
}
16+
}

0 commit comments

Comments
 (0)