Skip to content

Race-condition in socket close #40

@hlef

Description

@hlef

Unfortunately, our socket close is racy:

  • We shut down the socket here, but the TCP FIN message is not sent immediately because that is done by the TCP/IP thread which is not running.

  • Then we remove the firewall endpoint here. This is done immediately.

  • By the time the TCP/IP thread gets a chance to send the FIN message, the firewall blocks the outgoing packet.

We may not have detected this earlier because of the type of workloads we were running, but this is super obvious in the server API case I am about to PR.

Notes on how to fix this

Unfortunately, even if we add a quick trylock on the ipThreadLockState (which is there precisely to wake up the IP thread for that reason), it only allows the IP thread to send out the FIN but not to ACK the FIN ACK sent by the other party. Then we get a whole lot of retransmissions.

We cannot rely on a FreeRTOS+TCP callback (such as FREERTOS_SO_TCP_CONN_HANDLER) to tackle this. This is because FreeRTOS+TCP does not do a three-way handshake close when the socket is closed through FreeRTOS_socketclose(). Instead, FreeRTOS+TCP sends a FIN to the peer and destroys the socket immediately. Any further packet from the peer is answered with a RST (as a spurious packet). Because the socket is destroyed in a state when the TCP connection has not been properly closed (it is just in the "FIN sent" stage, i.e., FIN Wait-1), the callback is not called.

The only viable way to fix this seems to be adding a "shutting down list" to the firewall that either lets packets through or replies with a RST, until a timer expires.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions