-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
The problem
If a client dispatches a lot of pipeline commands before it attempts to read any of the responses back it might deadlock itself. Why? The AsyncFb will eventually block on a command response because the socket's send buffers are saturated (because the client is not reading). This will halt command execution on the AsyncFb because the fiber is blocked on io. The ConnFb will keep reading from the socket and dispatch the requests asynchronously to the AsyncFb until it gets saturated eventually as well. At this point, both ConnFb snd AsyncFb are not any making progress and client is stuck on sending more pipeline commands. Client will not start reading until it sends all of its pipeline requests and dragonfly will not read any of them as its dispatch_q_ is saturated and its ConnFb blocked.
The solution
Is to unblock the ConnFb by offloading socket backpressure to disk. Client requests won't freeze that way and ConnFb will keep serving until the client switches to reader mode (and when that happens, AsyncFb will eventually unblock).
First approach: offload parsed commands to disk directly
As seen in the original prototype in #6011. To summarize the flow here:
a. ConnFb -> b. ReadFromSocket -> c. ParseRequest -> d. Offload to disk or SendAsync
The problem with this approach is now the utility class that writes and reads to disk in step (d) needs to serialize it again when offloaded to disk. The reason for that is that our in memory structure for a parsed command can't be deserialized as is (and if written as is) because binary strings lack separation characters. We would like to avoid this ping-pong behavior. Note that resp3 is already a serialization protocol and is at least redundant and inefficient to parse and deserialize twice a given command.
Second approach: offload socket data as is to disk. This way we don't need to serialize etc.
This solves any of the redundancies discussed in the previous section but it does come with its own set of side effects.
Notice, that the flow within the ConnFb is synchronous. The fiber first reads from the socket and then dispatches. What happens if the ConnFb is stuck on client read and the AsyncFb must make progress and the only data available exists on disk (because it drained the in memory backpressure) ? Naturally the answer should be, AsyncFb reads from disk. Great, but who does the parsing if ConnFb is blocked waiting for client input ?
The problem here is that parsing is the responsibility of the ConnFb -- the AsyncFb is just there for action. Without separation of concerns the producer/consumer behaviour of the two falls apart. But it's not only the logical semantics of the two that are important -- had we delegated some of the parsing responsibility to AsyncFb we would have had a much more complicated state machine that is shared between the two. This is not a good direction.
#6006 discusses a redesign of pipelining.
Specifically, I quote:
- Stop using FiberRead and switch to OnRecv interface (chore: support onrecv hook for tcp sockets romange/helio#480). As OnRecv separates read notification from reading data, we can reduce reliance on per connection io_buf.
OnRecv can be called at any time when a fiber is preempted. Meaning that it will be able to read data during its other preemption points like cmd.dispatch.
By switching the ConnFb socket read path to use OnRecv we can unblock it from waiting for data to be available on the socket. That way, ConnFb can now parse data solving "the parsing in separate fibers" issue.
Bonus points is that now we have the first step in check for #6006
- Integrate and test OnRecv in ConnFb Integrate and test OnRecv in ConnFb #6028
- Implement a utility class that writes or reads a stream of data to disk [disk based queue #6029]
- TlsSocket and OnRecv in helio https://github.com/romange/helio/pull/480/files#diff-e23cdb5eee2e5d12ad3b9853e48dcfdc99ddbb3670ee960206c491dbcbe79c5aR472
- Offload connection backpressure to disk + test Offload connection backpressure to disk + test #6030
We can also use #6011 to experiment and prototype.