You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The one remaining loophole is Redis `BLPOP` cancellation, which was also
the original problem when changing from per-request connections to
connection pools.
Once a `BLPOP` command is started, Python will switch to some other task
while waiting for the server to respond. If that other task decides to
cancel the `receive`, then one of two things will happen:
- the connection is dropped soon enough for the server to notice, so
that it won't send the response at all, or
- the response already reached Python's process, but the task processing
it is flagged for cancellation.
In the latter scenario, the message will be lost. To solve this, `BLPOP`
could be shielded from cancellation, but this would lead to problematic
timeout handling, where a read command to Redis could easily outlive its
event loop, leading to all sorts of problems for shutdown code or
environments where the global channels layer instance is used from
multiple event loops.
Redis documentation suggests using `BRPOPLPUSH` instead, where messages
are atomically returned and copied to an auxiliary list. This makes the
Python implementation easier inasmuch as cancellation during the
`BRPOPLPUSH` no longer needs to be handled; but the message must still
be removed from the auxiliary list once processed - if and only if it is
processed.
One option for implementing this is to make the auxiliary removal
completely uninterruptible from the point of view of the code path
executing it. In a usual single-threaded asyncio program, the
`BRPOPLPUSH` would be the last interruptible operation before returning
a valid message; the code following this command is essentially atomic
with respect to any other task attached to the event loop. Any other
task might then reasonably expect that either:
- the receive succeeded completely, will return a valid message and has
cleaned the auxiliary list, or
- `BRPOPLPUSH` was interrupted, there will be no message to speak of and
there is a backup of it in the auxiliary list.
Making removal atomic violates the second expectation and also a fairly
basic principle of single-threaded asyncio code. Because it is still an
interruptible operation, other tasks will be run during this "atomic"
piece of code, meaning it could be cancelled. If it is cancelled, then
that cancellation must become visible outside, otherwise we've just
"run" two independent code paths at the same time in a single thread,
and also made life difficult for anyone wanting to make sure tasks are
complete in a timely manner - the receive cancellation would essentially
be gobbled and the task would need to be cancelled again at its next
interruption point.
If the cancellation must proceed, however, we could be left in the
following situation: the message has been unpacked, but the cancellation
occured too late to stop the removal operation, so we're left with a
message that we somehow have to return, because there's no backup of it
in Redis.
There would be a way to solve this e.g. by subclassing the cancellation
exception and having it carry the message as a payload, but that would
be messy.
A second option is to create a detached removal task. This lets us still
have a properly atomic code path from when the message is unpacked to
when the function returns, and the removal will happen at some point
"soon", because there are no delays in it.
Notably, ordering is not a problem: if two close receives both fire off
removals for their own messages, the end result will still be that two
messages will be deleted from a list into which two messages were put
earlier by Redis.
On the other hand, it is a problem if a receive commences after a
previous receive, but before its cleanup: the message backup would be
moved back into the regular message queue, and the second receive would
get it again even though the earlier receive processed it successfully.
The solution to this is rather heavy-handed but easily reasoned about:
the stretch of time between starting a removal task and finishing it can
be protected by a per-channel lock. Acquiring the lock is interruptible
but doesn't change the cancellation semantics of the design, since
cancelling lock acquisition has the same effect as cancelling
`BRPOPLPUSH` before the command reaches Redis - no message is received
and Redis state is not corrupted.
*NOTE*: It is useful to observe that `BRPOPLPUSH` is also the only Redis
command that requires this level of special handling. The others used by
`channels_redis` are either read-only, such as `LLEN`, or are
destructive without expecting a return value, such as `ZREM`.
`BRPOPLPUSH` is at the same time destructive (an element is removed
server-side from the list) and with an important return value.
0 commit comments