Skip to content

Commit 7262433

Browse files
committed
BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
epoll_wait() may return EPOLLUP and/or EPOLLRDHUP after an asynchronous connect(), to indicate that the peer accepted the connection then immediately closed before epoll_wait() returned. When this happens, sock_conn_check() is called to check whether or not the connection correctly established, and after that the receive channel of the socket is assumed to already be closed. This lets haproxy send the request at best (if RDHUP and not HUP) then immediately close. Over the last two years, there were a few reports about this spuriously happening on connections where network captures proved that the server had not closed at all (and sometimes even received the request and responded to it after haproxy had closed). The logs show that a successful connection is immediately reported on error after the request was sent. After investigations, it appeared that a EPOLLUP, or eventually a EPOLLRDHUP, can be reported by epool_wait() during the connect() but in sock_conn_check(), the connect() reports a success. So the connection is validated but the HUP is handled on the first receive and an error is reported. The same behavior could be observed on health-checks, leading HAProxy to consider the server as DOWN while it is not. The only explanation at this point is that it is a kernel bug, notably because it does not even match the documentation for connect() nor epoll. In addition for now it was only observed with Ubuntu kernels 5.4 and 5.15 and was never reproduced on any other one. We have no reproducer but here is the typical strace observed: socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 114 fcntl(114, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(114, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = -1 EINPROGRESS (Operation now in progress) epoll_ctl(19, EPOLL_CTL_ADD, 114, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=114, u64=114}}) = 0 epoll_wait(19, [{events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=151, u64=151}}, {events=EPOLLIN, data={u32=59, u64=59}}, {events=EPOLLIN|EPOLLRDHUP, data={u32=114, u64=114}}], 200, 0) = 4 epoll_ctl(19, EPOLL_CTL_MOD, 114, {events=EPOLLOUT, data={u32=114, u64=114}}) = 0 epoll_wait(19, [{events=EPOLLOUT, data={u32=114, u64=114}}, {events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=10, u64=10}}, {events=EPOLLIN, data={u32=165, u64=165}}], 200, 0) = 4 connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = 0 sendto(114, "POST "..., 1009, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 1009 close(114) = 0 Some ressources about this issue: - https://www.spinics.net/lists/netdev/msg876470.html - haproxy/haproxy#1863 - haproxy/haproxy#2368 So, to workaround the issue, we have decided to remove FD_POLL_HUP flag on the FD during the connection establishement if FD_POLL_ERR is not reported too in sock_conn_check(). This way, the call to connect() is able to validate or reject the connection. At the end, if the HUP or RDHUP flags were valid, either connect() would report the error itself, or the next recv() would return 0 confirming the closure that the poller tried to report. EPOLL_RDHUP is only an optimization to save a syscall anyway, and this pattern is so rare that nobody will ever notice the extra call to recv(). Please note that at least one reporter confirmed that using poll() instead of epoll() also addressed the problem, so that can also be a temporary workaround for those discovering the problem without the ability to immediately upgrade. The event is accounted via a COUNT_IF(), to be able to spot it in future issue. Just in case. This patch should fix the issue #1863 and #2368. It may be related to #2751. It should be backported as far as 2.4. In 3.0 and below, the COUNT_IF() must be removed.
1 parent eea2697 commit 7262433

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

src/sock.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -945,6 +945,31 @@ int sock_conn_check(struct connection *conn)
945945
goto done;
946946
if (!(fdtab[fd].state & (FD_POLL_ERR|FD_POLL_HUP)))
947947
goto wait;
948+
949+
/* Removing HUP if there is no ERR reported.
950+
*
951+
* After a first connect() returning EINPROGRESS, it seems
952+
* possible to have EPOLLHUP or EPOLLRDHUP reported by
953+
* epoll_wait() and turned to an error while the following
954+
* connect() will return a success. So the connection is
955+
* validated but the error is saved and reported on the first
956+
* subsequent read.
957+
*
958+
* We have no explanation for now. Why epoll report the
959+
* connection is closed while the connect() it able to validate
960+
* it ? no idea. But, it seems reasonnable in this case, and if
961+
* no error was reported, to remove the the HUP flag. At worst, if
962+
* the connection is really closed, this will be reported later.
963+
*
964+
* Only observed on Ubuntu kernel (5.4/5.15). See:
965+
* - https://github.com/haproxy/haproxy/issues/1863
966+
* - https://www.spinics.net/lists/netdev/msg876470.html
967+
*/
968+
if (unlikely((fdtab[fd].state & (FD_POLL_HUP|FD_POLL_ERR)) == FD_POLL_HUP)) {
969+
COUNT_IF(1, "Removing FD_POLL_HUP if no FD_POLL_ERR to let connect() decide");
970+
fdtab[fd].state &= ~FD_POLL_HUP;
971+
}
972+
948973
/* error present, fall through common error check path */
949974
}
950975

0 commit comments

Comments
 (0)