Skip to content

tcp: packet dropped if received right after connection is open #58084

@vaussard

Description

@vaussard

Describe the bug

I have a firmware based on Zephyr 2.7 (+ some backported fixes like #56324) running on a STM32L4 which connects to a Linux host. I think that the issue is still present in the latest versions of Zephyr because the code in that area is not overly different.

What happens:

  • Zephyr opens a TCP connection. I see the 3-way handshake which completes
  • The host sends a packet as soon as the connection is established
  • net_context_recv() was not yet called at this point, since it is done at the end of zsock_connect_ctx()
    • As a result tcp_data_get() sees that conn->context->recv_cb is NULL, thus the result is NET_DROP. The packet is ACK'ed but never handled by the application
  • Finally net_context_recv() is called in order to install the receive callback zsock_received_cb(), but the received packet was already dropped

To summarize, it looks like there is a window during zsock_connect_ctx() where a received packet can be accepted but dropped between the calls to net_context_connect() and net_context_recv(). Since the packet was ACK'ed, it will never be retransmitted by the peer.

I did not observed this behaviour while using older versions of Zephyr with TCP1, but the overall architecture looks similar, so it was maybe just by luck or a different scheduling of the threads?

Expected behavior

All packets are correctly received and processed by the firmware as soon as the connection is established. Worst case, if the packet is dropped without sending an ACK, it would be retransmitted by the peer.

Impact

The host application waits for the answer and finally closes the connection after some timeout. This breaks our existing process.

Logs and console output

Here is a debug log where we see that net_tcp_recv() is called after tcp_in() from the 1st received packet:

[2023-05-19 15:44:53.931] [00:00:22.970,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): context: 0x20012a6c, local: ::, remote: fd11:8963:2111::1  
[2023-05-19 15:44:53.943] [00:00:22.980,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): conn: 0x200237c8 src: fd11:8963:1c1:9:60ef:c4ff:fe4d:ffbc, dst: fd11:8963:2111::1  
[2023-05-19 15:44:53.958] [00:00:23.000,000] <dbg> net_tcp.tcp_in: (xxx_send):  [LISTEN Seq=1148679436 Ack=0]  
[2023-05-19 15:44:53.968] [00:00:23.010,000] <dbg> net_tcp.tcp_out_ext: (xxx_send): SYN Seq=1148679436 Len=0    
[2023-05-19 15:44:53.977] [00:00:23.020,000] <dbg> net_tcp.tcp_send_process_no_lock: (xxx_send): SYN Seq=1148679436 Len=0·  
[2023-05-19 15:44:53.988] [00:00:23.030,000] <dbg> net_tcp.tcp_send: (xxx_send): SYN Seq=1148679436 Len=0    
[2023-05-19 15:44:54.000] [00:00:23.040,000] <dbg> net_tcp.tcp_in: (xxx_send): LISTEN->SYN_SENT  
[2023-05-19 15:44:54.010] [00:00:23.050,000] <dbg> net_tcp.tcp_in: (rx_q[0]): SYN,ACK Seq=2071107776 Ack=1148679437 Len=0 [SYN_SENT Seq=1148679437 Ack=0]  
[2023-05-19 15:44:54.023] [00:00:23.060,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): len=4    
[2023-05-19 15:44:54.031] [00:00:23.070,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): opt: 2, opt_len: 4  
[2023-05-19 15:44:54.040] [00:00:23.080,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): MSS=1440  
[2023-05-19 15:44:54.048] [00:00:23.090,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64800 to 6826  
[2023-05-19 15:44:54.058] [00:00:23.100,000] <dbg> net_tcp.tcp_send_timer_cancel: (rx_q[0]): SYN Seq=1148679436 Len=0    
[2023-05-19 15:44:54.068] [00:00:23.110,000] <dbg> net_tcp.tcp_conn_ref: (rx_q[0]): conn: 0x200237c8, ref_count: 2  
[2023-05-19 15:44:54.078] [00:00:23.120,000] <dbg> net_tcp.tcp_out_ext: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0
[2023-05-19 15:44:54.089] [00:00:23.130,000] <dbg> net_tcp.tcp_send_process_no_lock: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0·  
[2023-05-19 15:44:54.101] [00:00:23.140,000] <dbg> net_tcp.tcp_send: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0    

/* ACK sent, connection is open */

/* Receiving the first packet - it is ACK'ed but dropped */
[2023-05-19 15:44:54.111] [00:00:23.150,000] <db[00:00:23.160,000] <dbg> net_tcp.tcp_in: (rx_q[0]): ACK,PSH Seq=2071107777 Ack=1148679437 Len=10 [ESTABLISHED Seq=1148679437 Ack=2071107777]  
[2023-05-19 15:44:54.131] [00:00:23.170,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64800 to 6826  
[2023-05-19 15:44:54.142] [00:00:23.180,000] <dbg> net_tcp.tcp_out_ext: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0
[2023-05-19 15:44:54.152] [00:00:23.190,000] <dbg> net_tcp.tcp_send_process_no_lock: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0·  
[2023-05-19 15:44:54.164] [00:00:23.210,000] <dbg> net_tcp.tcp_send: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0

(...)

/* zsock_connect_ctx() completes */
[2023-05-19 15:44:54.243] [00:00:23.280,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): conn: 0x200237c8, ret=0    
[2023-05-19 15:44:54.254] [00:00:23.300,000] <dbg> net_tcp.net_tcp_recv: (xxx_send): context: 0x20012a6c, cb: 0x80383c1, user_data: (nil)    

Metadata

Metadata

Assignees

Labels

area: NetworkingbugThe issue is a bug, or the PR is fixing a bugpriority: mediumMedium impact/importance bug

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions