|
| 1 | +# Retry Loop Retry |
| 2 | + |
| 3 | +Some time ago I lamented that I don't know how to write a retry loop such that: |
| 4 | + |
| 5 | +- it is syntactically obvious that the amount of retries is bounded, |
| 6 | +- there's no spurious extra sleep after the last attempt, |
| 7 | +- the original error is reported if retrying fails, |
| 8 | +- there's no code duplication in the loop. |
| 9 | + |
| 10 | +<https://matklad.github.io/2023/12/21/retry-loop.html> |
| 11 | + |
| 12 | +To recap, we have |
| 13 | + |
| 14 | +```zig |
| 15 | +fn action() E!T { ... } |
| 16 | +fn is_transient_error(err: E) bool { ... } |
| 17 | +``` |
| 18 | + |
| 19 | +and we need to write |
| 20 | + |
| 21 | +```zig |
| 22 | +fn action_with_retries(retry_count: u32) E!T { ... } |
| 23 | +``` |
| 24 | + |
| 25 | +I've received many suggestions, and the best one was from |
| 26 | +[<https://www.joachimschipper.nl>,]{.display} |
| 27 | +though it was somewhat specific to Python: |
| 28 | + |
| 29 | +```python |
| 30 | +for tries_left in reverse(range(retry_count)): |
| 31 | + try: |
| 32 | + return action() |
| 33 | + except Exception as e: |
| 34 | + if tries_left == 0 or not is_transient_error(e): |
| 35 | + raise |
| 36 | + sleep() |
| 37 | +else: |
| 38 | + assert False |
| 39 | +``` |
| 40 | + |
| 41 | +A couple of days ago I learned to think better about the problem. You see, the first requirement, |
| 42 | +that the number of retries is bounded syntactically, was leading me down the wrong path. If we |
| 43 | +_start_ with that requirement, we get code shape like: |
| 44 | + |
| 45 | +```zig |
| 46 | +const result: E!T = for (0..retry_count) { |
| 47 | + // ??? |
| 48 | + action() |
| 49 | + // ??? |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +The salient point here is that, no matter what we do, we need to get `E` or `T` out as a result, so |
| 54 | +we'll have to call `action()` at least once. But `retry_count` _could_ be zero. Looking at the |
| 55 | +static semantics, any non `do while` loop's body can be skipped completely, so we'll have to have |
| 56 | +some runtime asserts explaining to the compiler that we really did run `action` at least once. The |
| 57 | +part of the loop which is guaranteed to be executed at least once is a condition. So it's more |
| 58 | +fruitful to flip this around: it's not that we are looping until we are out of attempts, but, |
| 59 | +rather, we are looping while the underlying action returns an error, and then retries are an extra |
| 60 | +condition to exit the loop early: |
| 61 | + |
| 62 | +```zig |
| 63 | +var retries_left = retry_count; |
| 64 | +const result = try while(true) { |
| 65 | + const err = if (action()) |ok| break ok else |err| err; |
| 66 | + if (!is_transient_error(err)) break err; |
| 67 | + |
| 68 | + if (retries_left == 0) break err; |
| 69 | + retries_left -= 1; |
| 70 | + sleep(); |
| 71 | +}; |
| 72 | +``` |
| 73 | + |
| 74 | +This shape of the loop also works if the condition for retries is not attempts based, but, say, time |
| 75 | +based. Sadly, this throws "loop is obviously bounded" requirement out of the window. But it can be |
| 76 | +restored by adding _upper bound_ to the infinite loop: |
| 77 | + |
| 78 | +```zig |
| 79 | +var retries_left = retry_count; |
| 80 | +const result = try for(0..retry_count + 1) { |
| 81 | + const err = if (action()) |ok| break ok else |err| err; |
| 82 | + if (!is_transient_error(err)) break err; |
| 83 | + |
| 84 | + if (retries_left == 0) break err; |
| 85 | + retries_left -= 1; |
| 86 | + sleep(); |
| 87 | +} else @panic("runaway loop"); |
| 88 | +``` |
| 89 | + |
| 90 | +I still don't like it (if you forget that `+1`, you'll get a panic!), but that's where I am at! |
0 commit comments