-
Notifications
You must be signed in to change notification settings - Fork 206
Description
Describe the bug
FreeRTOS-Plus-TCP (tested with v4.2.0, but v4.3, which I'll cite herein, looks from inspection to behave identically here) fails to trigger an ARP request for its gateway when sending UDPv4 datagrams to off-segment addresses. As a result, the system is effectively unable to send such UDP packets until, AFAICT, either
- the application manually requests an ARP query (for the gateway?), or
- the gateway sends an ARP request for the system, which will suffice to populate the gateway's entry in the ARP cache.
Specifically, assuming I understand the packet traces and debug logs below and am reading the code correctly, the UDP/IPv4 egress path of course queries the ARP table and that has handling for gateways, but when a gateway is needed, the UDPv4 code, despite the comment naming the correct variable, queries for the same address as the gateway handling did and so takes the wrong path rather than the one that generates an ARP query.
Incidentally, there's code in the IP packet ingress handling path to refresh the ARP table or trigger an ARP request, but... it's disabled for UDP packets, which DHCP uses. Perhaps it should be disabled for {multi,broad}cast packets instead, and allow unicast (UDP and otherwise) packets to trigger it? This would have masked the above issue in my case, and in many common cases, because often the DHCP server and the default gateway are one and the same. I'm not sure if that's an argument for or against adopting this behavior!
Target
- Development board: CHERIoT Sonata
- Instruction Set Architecture: CHERIoT (a RV32E derivative)
- IDE and version: n/a
- Toolchain and version: CHERIoT LLVM 20
The curious are welcome to see the CHERIoT network stack interfaces to FreeRTOS-Plus-TCP, but I do not believe that our interface code differs significantly for the purposes of this bug from any other application.
Host
- Host OS: Linux
- Version: Debian Trixie
To Reproduce
Run a FreeRTOS-Plus-TCP application that performs DHCP and attempts to send a UDP packet (perhaps specifically not DNS).
Expected behavior
I expect FreeRTOS-Plus-TCP to either
- initiate an ARP request when sending a UDP datagram to an address it does not know how to reach, buffering the UDP packet for transmission upon resolution, or
- indicate failure to the application's UDP
sendto.
Wireshark logs
Here's an example application running on Sonata and trying to do DHCP followed by SNTP. We can see that there's a while a startup where the system won't generate UDP packets destined for off-segment addresses (via the gateway).
-
The system initializes and performs DHCP successfully:
02:14:55.803009 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 300: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 286) 0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 258, xid 0xcaabede, Flags [Broadcast] (0x8000) 02:14:55.805299 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0xc0, ttl 64, id 1029, offset 0, flags [none], proto UDP (17), length 328) 172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xcaabede, Flags [Broadcast] (0x8000) 02:14:55.817799 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 307: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 293) 0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 265, xid 0xcaabede, Flags [Broadcast] (0x8000) 02:14:56.269416 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 367: (tos 0xc0, ttl 64, id 1034, offset 0, flags [none], proto UDP (17), length 353) 172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 325, xid 0xcaabede, Flags [Broadcast] (0x8000)The system logs this startup and DHCP state machine traversal thus:
FreeRTOS_AddEndPoint: MAC: fe-7a IPv4: c0a801f8ip ip_thread_start ip_thread_entry starting, thread ID is 0x1 prvIPTask started vIPSetDHCP_RATimerEnableState: Off prvCloseDHCPSocket[fe-7a]: closed, user count 0 vDHCPProcessEndPoint: enter 0 DHCP-socket[fe-7a]: DHCP Socket Create prvCreateDHCPSocket[fe-7a]: open, user count 1 prvInitialiseDHCP: start after 25 ticks vDHCP_RATimerReload: 25 vDHCPProcessEndPoint: exit 1 vDHCPProcessEndPoint: enter 1 vDHCPProcess: discover vDHCPProcessEndPoint: exit 2 vDHCPProcessEndPoint: enter 2 vDHCPProcess: discover vDHCPProcess: timeout 1000 ticks vDHCPProcess: offer ac1d0761ip for MAC address fe-7a vDHCPProcess: reply ac1d0761ip vDHCPProcessEndPoint: exit 3 vDHCPProcessEndPoint: enter 3 vDHCPProcess: offer ac1d0761ip for MAC address fe-7a vDHCPProcess: acked ac1d0761ip prvCloseDHCPSocket[fe-7a]: closed, user count 0 vDHCP_RATimerReload: 4320000 vDHCPProcessEndPoint: exit 5 -
Immediately thereafter, the system does a gratuitous ARP announcement as duplicate check:
02:14:56.282088 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 172.29.7.97 tell 172.29.7.97, length 46No response is received, as no duplicate exists on the network.
-
The application now attempts to send a UDP packet (SNTP, specifically, using coreSNTP). No packets, UDP or ARP or otherwise, are emitted and the system logs the following:
SNTP Info: Sending serialized SNTP request packet to the server: Addr=2214620366, Port=123 FreeRTOS_FindEndPointOnNetMask[4]: No match for ce6c0084ip ARP ce6c0084ip miss using ac1d0701ip FreeRTOS_FindEndPointOnNetMask[11]: No match for ce6c0084ipNote in particular that the 2nd
FindEndPointOnNetMaskcall is for the same, off-segment address as the first! (Some additional instrumentation shows that thesendtohas returned the expected48, indicating success, which is a bit rude.) coreSNTP eventually times out and reports failure to the application, which goes to sleep (before retrying). -
While the application is asleep, the gateway sends an ARP request to refresh its cache entry for the system, and the system responds:
02:15:01.393897 b4:fb:e4:20:ca:6b > 3a:30:25:24:fe:7a, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 172.29.7.97 tell 172.29.7.1, length 46 02:15:01.398879 3a:30:25:24:fe:7a > b4:fb:e4:20:ca:6b, ethertype ARP (0x0806), length 64: Ethernet (len 6), IPv4 (len 4), Reply 172.29.7.97 is-at 3a:30:25:24:fe:7a, length 50The system logs this and the insertion into its ARP cache:
pxEasyFit: ARP ac1d0701ip -> ac1d0761ip ipARP_REQUEST from ac1d0701ip to ac1d0761ip end-point ac1d0761ipThis happens only because the gateway and the DHCP server are one and the same. Were the gateway a different node, it might never issue an ARP request for the system.
-
The application wakes up and retries SNTP. At this point, there is a hit in the ARP cache and a packet is sent:
02:15:06.321627 3a:30:25:24:fe:7a > b4:fb:e4:20:ca:6b, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 76) 172.29.7.97.15848 > 149.56.19.163.123: [no cksum] NTPv4, Client, length 48The system logs:
SNTP Info: Sending serialized SNTP request packet to the server: Addr=2214620366, Port=123 FreeRTOS_FindEndPointOnNetMask[4]: No match for ce6c0084ip ARP ce6c0084ip hit using ac1d0701ip