-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
Describe the bug
I have a firmware based on Zephyr 2.7 (+ some backported fixes like #56324) running on a STM32L4 which connects to a Linux host. I think that the issue is still present in the latest versions of Zephyr because the code in that area is not overly different.
What happens:
- Zephyr opens a TCP connection. I see the 3-way handshake which completes
- The host sends a packet as soon as the connection is established
net_context_recv()was not yet called at this point, since it is done at the end ofzsock_connect_ctx()- As a result
tcp_data_get()sees thatconn->context->recv_cbis NULL, thus the result is NET_DROP. The packet is ACK'ed but never handled by the application
- As a result
- Finally
net_context_recv()is called in order to install the receive callbackzsock_received_cb(), but the received packet was already dropped
To summarize, it looks like there is a window during zsock_connect_ctx() where a received packet can be accepted but dropped between the calls to net_context_connect() and net_context_recv(). Since the packet was ACK'ed, it will never be retransmitted by the peer.
I did not observed this behaviour while using older versions of Zephyr with TCP1, but the overall architecture looks similar, so it was maybe just by luck or a different scheduling of the threads?
Expected behavior
All packets are correctly received and processed by the firmware as soon as the connection is established. Worst case, if the packet is dropped without sending an ACK, it would be retransmitted by the peer.
Impact
The host application waits for the answer and finally closes the connection after some timeout. This breaks our existing process.
Logs and console output
Here is a debug log where we see that net_tcp_recv() is called after tcp_in() from the 1st received packet:
[2023-05-19 15:44:53.931] [00:00:22.970,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): context: 0x20012a6c, local: ::, remote: fd11:8963:2111::1
[2023-05-19 15:44:53.943] [00:00:22.980,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): conn: 0x200237c8 src: fd11:8963:1c1:9:60ef:c4ff:fe4d:ffbc, dst: fd11:8963:2111::1
[2023-05-19 15:44:53.958] [00:00:23.000,000] <dbg> net_tcp.tcp_in: (xxx_send): [LISTEN Seq=1148679436 Ack=0]
[2023-05-19 15:44:53.968] [00:00:23.010,000] <dbg> net_tcp.tcp_out_ext: (xxx_send): SYN Seq=1148679436 Len=0
[2023-05-19 15:44:53.977] [00:00:23.020,000] <dbg> net_tcp.tcp_send_process_no_lock: (xxx_send): SYN Seq=1148679436 Len=0·
[2023-05-19 15:44:53.988] [00:00:23.030,000] <dbg> net_tcp.tcp_send: (xxx_send): SYN Seq=1148679436 Len=0
[2023-05-19 15:44:54.000] [00:00:23.040,000] <dbg> net_tcp.tcp_in: (xxx_send): LISTEN->SYN_SENT
[2023-05-19 15:44:54.010] [00:00:23.050,000] <dbg> net_tcp.tcp_in: (rx_q[0]): SYN,ACK Seq=2071107776 Ack=1148679437 Len=0 [SYN_SENT Seq=1148679437 Ack=0]
[2023-05-19 15:44:54.023] [00:00:23.060,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): len=4
[2023-05-19 15:44:54.031] [00:00:23.070,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): opt: 2, opt_len: 4
[2023-05-19 15:44:54.040] [00:00:23.080,000] <dbg> net_tcp.tcp_options_check: (rx_q[0]): MSS=1440
[2023-05-19 15:44:54.048] [00:00:23.090,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64800 to 6826
[2023-05-19 15:44:54.058] [00:00:23.100,000] <dbg> net_tcp.tcp_send_timer_cancel: (rx_q[0]): SYN Seq=1148679436 Len=0
[2023-05-19 15:44:54.068] [00:00:23.110,000] <dbg> net_tcp.tcp_conn_ref: (rx_q[0]): conn: 0x200237c8, ref_count: 2
[2023-05-19 15:44:54.078] [00:00:23.120,000] <dbg> net_tcp.tcp_out_ext: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0
[2023-05-19 15:44:54.089] [00:00:23.130,000] <dbg> net_tcp.tcp_send_process_no_lock: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0·
[2023-05-19 15:44:54.101] [00:00:23.140,000] <dbg> net_tcp.tcp_send: (rx_q[0]): ACK Seq=1148679437 Ack=2071107777 Len=0
/* ACK sent, connection is open */
/* Receiving the first packet - it is ACK'ed but dropped */
[2023-05-19 15:44:54.111] [00:00:23.150,000] <db[00:00:23.160,000] <dbg> net_tcp.tcp_in: (rx_q[0]): ACK,PSH Seq=2071107777 Ack=1148679437 Len=10 [ESTABLISHED Seq=1148679437 Ack=2071107777]
[2023-05-19 15:44:54.131] [00:00:23.170,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64800 to 6826
[2023-05-19 15:44:54.142] [00:00:23.180,000] <dbg> net_tcp.tcp_out_ext: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0
[2023-05-19 15:44:54.152] [00:00:23.190,000] <dbg> net_tcp.tcp_send_process_no_lock: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0·
[2023-05-19 15:44:54.164] [00:00:23.210,000] <dbg> net_tcp.tcp_send: (rx_q[0]): ACK Seq=1148679437 Ack=2071107787 Len=0
(...)
/* zsock_connect_ctx() completes */
[2023-05-19 15:44:54.243] [00:00:23.280,000] <dbg> net_tcp.net_tcp_connect: (xxx_send): conn: 0x200237c8, ret=0
[2023-05-19 15:44:54.254] [00:00:23.300,000] <dbg> net_tcp.net_tcp_recv: (xxx_send): context: 0x20012a6c, cb: 0x80383c1, user_data: (nil)