feat(fake-tcp) add TCP keep-alive support to client#93
feat(fake-tcp) add TCP keep-alive support to client#93andreadaoud wants to merge 1 commit intodndx:mainfrom
Conversation
|
I'd love this feature. In a very common IPv6 setup, the LAN client probably gets its IP address by SLAAC from router's RA. Suppose the ISP delegates a new prefix to the router, the client would have a new IP but still keep the old (not-yet-expired-but-unusable-already) IP - while the phantun connection still last - which means conntrack isn't going to MASQUERADE to a new address. Currently the workaround is to simply restart the phantun, but there should be a more elegant way. |
|
@sakamoto-poteko Could you please test if this works? |
@andreadaoud I've been running with |
|
@dndx Really hope this keepalive feature can be merged! Phantun works great on stable servers where upstream routers properly send RST packets, but it's a different story on edge devices with unstable networks. The problem: When using PPPoE with dynamic IPs, IP changes don't properly notify phantun connections - they just hang forever. Same issue when routers/ONU device/Wifi+Router overheat and hard reboot. Without keepalive detection, these dead connections never get cleaned up. Why we need it at phantun layer: Even if applications have their own keepalive, they can't detect broken fake-TCP connections. When NAT tables reset or routers fail silently, the UDP side keeps sending but the TCP connection is actually dead. This implementation looks solid - zero overhead on active connections since any traffic resets the timer, and it's optional/configurable. For unstable networks, this isn't just nice-to-have, it's essential for reliability. |
|
@luhengsw Yeah I am open to merging this but the merge conflict needs to be rebased & resolved. If you or someone else could help that would speedup the process. Otherwise I will rebase it once I have the time. |
|
@dndx Would you like me to open a new PR, or would @andreadaoud prefer to update this PR directly? |
This adds TCP KeepAlive support to client. Close #30, also related to #87. Three arguments are added: keepalive-time, keepalive-interval, keepalive-retries.
Let's say keepalive-time=300s, keepalive-interval=10s, keepalive-retries=3. This feature works like this: when the client detects that no TCP packet has arrived within 300 seconds, it will send a TCP Keep-alive packet to server. If server doesn't respond in time, the client will retry in 10 seconds. After 3 retries, the client will give up this connection. When server received the Keep-alive packet, it will immediately respond with an ACK.
Although the user application may already have its own keep-alive mechanism, they cannot detect broken connections. To actually distinguish broken "TCP" connection, it is still necessary to have keep-alive on fake-tcp layer. For example, if the NAT mapping table on the router is somehow reset, packets are no longer be able to go through. Without keep alive detection, the client will not mark the connection as expired because the UDP client keeps sending packets, so the UDP client may never be able to communicate. With keep alive detection, after the phantun client detected that the connection is not receiving data, it will actively send keep-alive packets. When maximum retries are reached without response, the client knows that the connection is broken, and it will remove the connection from its memory, so a new connection can be established. This feature will not add any extra overhead on active connections, because packets from the user application will prevent the keepalive-time timer from being triggered.