Skip to content

EPoll is not notified about a fatal error when receiving data from a socket #511

@maxsharabayko

Description

@maxsharabayko

We have a hidden issue in handling the fatal connection errors with epoll.
If EReadStatus CChannel::recvfrom(sockaddr* addr, CPacket& packet) const returns RST_ERROR, the epoll never gets notified about the error on that socket.
The test condition can be easily reproduced by returning RST_ERROR from the recvfrom function.
Also reverting changes of #468 (i.e. treating WSAECONNRESET as a fatal error on Windows) make the issue reproduceable on Windows (see the ConnectionTimeoutTest in the test folder).

What is happening in case recvfrom DOES NOT return an error?

The worker thread CRcvQueue::worker is in the while() loop. The socket, that attempts to connect, is in the list m_lRendezvousID. The CRcvQueue::worker thread calls void CRendezvousQueue::updateConnStatus() on every loop iteration. The function updates the socket status, in particular, when TTL is expired and the socket has to be closed.

// partial pseudocode
void* CRcvQueue::worker(void* param)
{
    // (...)
    while (!self->m_bClosing)
    {
        EReadStatus rst = self->worker_RetrieveUnit(Ref(id), Ref(unit), &sa);
        if (rst == RST_ERROR)
        {
            // !!! This break will skip updateConnStatus on the socket
            // and epoll is never notified
            break;
        }
    
        self->m_pRendezvousQueue->updateConnStatus(rst, cst, unit->m_Packet);
    }
    
    return;
}

One of the following calls to void CRendezvousQueue::updateConnStatus() triggers an opened socket to be closed by timeout.
After this the epoll gets notified about the error on that socket. And then all the threads are closed as expected.

What is happening in case recvfrom DOES return an error?

If the recvfrom return an error, the worker thread CRcvQueue::worker loop breaks and thread function returns. This way the next call to the updateConnStatus() is skipped and never happens. Therefore the epoll is never notified about the error on this socket. srt_epoll_wait can only return on a wait timeout, instead of handling the error event on the socket.

This is probably a minor issue, because in general we still have the epoll wait timeout, that handles these conditions.

Metadata

Metadata

Assignees

Labels

Priority: LowType: BugIndicates an unexpected problem or unintended behavior[core]Area: Changes in SRT library core

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions