Skip to content

[BUG] UDP/IPv4 transmission does not ARP-query gateways #1244

@nwf

Description

@nwf

Describe the bug

FreeRTOS-Plus-TCP (tested with v4.2.0, but v4.3, which I'll cite herein, looks from inspection to behave identically here) fails to trigger an ARP request for its gateway when sending UDPv4 datagrams to off-segment addresses. As a result, the system is effectively unable to send such UDP packets until, AFAICT, either

Specifically, assuming I understand the packet traces and debug logs below and am reading the code correctly, the UDP/IPv4 egress path of course queries the ARP table and that has handling for gateways, but when a gateway is needed, the UDPv4 code, despite the comment naming the correct variable, queries for the same address as the gateway handling did and so takes the wrong path rather than the one that generates an ARP query.

Incidentally, there's code in the IP packet ingress handling path to refresh the ARP table or trigger an ARP request, but... it's disabled for UDP packets, which DHCP uses. Perhaps it should be disabled for {multi,broad}cast packets instead, and allow unicast (UDP and otherwise) packets to trigger it? This would have masked the above issue in my case, and in many common cases, because often the DHCP server and the default gateway are one and the same. I'm not sure if that's an argument for or against adopting this behavior!

Target

  • Development board: CHERIoT Sonata
  • Instruction Set Architecture: CHERIoT (a RV32E derivative)
  • IDE and version: n/a
  • Toolchain and version: CHERIoT LLVM 20

The curious are welcome to see the CHERIoT network stack interfaces to FreeRTOS-Plus-TCP, but I do not believe that our interface code differs significantly for the purposes of this bug from any other application.

Host

  • Host OS: Linux
  • Version: Debian Trixie

To Reproduce
Run a FreeRTOS-Plus-TCP application that performs DHCP and attempts to send a UDP packet (perhaps specifically not DNS).

Expected behavior
I expect FreeRTOS-Plus-TCP to either

  • initiate an ARP request when sending a UDP datagram to an address it does not know how to reach, buffering the UDP packet for transmission upon resolution, or
  • indicate failure to the application's UDP sendto.

Wireshark logs

Here's an example application running on Sonata and trying to do DHCP followed by SNTP. We can see that there's a while a startup where the system won't generate UDP packets destined for off-segment addresses (via the gateway).

  • The system initializes and performs DHCP successfully:

    02:14:55.803009 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 300: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 286)
        0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 258, xid 0xcaabede, Flags [Broadcast] (0x8000)
    02:14:55.805299 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: (tos 0xc0, ttl 64, id 1029, offset 0, flags [none], proto UDP (17), length 328)
        172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xcaabede, Flags [Broadcast] (0x8000)
    02:14:55.817799 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 307: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 293)
        0.0.0.0.68 > 255.255.255.255.67: [no cksum] BOOTP/DHCP, Request from 3a:30:25:24:fe:7a, length 265, xid 0xcaabede, Flags [Broadcast] (0x8000)
    02:14:56.269416 b4:fb:e4:20:ca:6b > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 367: (tos 0xc0, ttl 64, id 1034, offset 0, flags [none], proto UDP (17), length 353)
        172.29.7.1.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 325, xid 0xcaabede, Flags [Broadcast] (0x8000)
    

    The system logs this startup and DHCP state machine traversal thus:

    FreeRTOS_AddEndPoint: MAC: fe-7a IPv4: c0a801f8ip
    ip_thread_start
    ip_thread_entry starting, thread ID is 0x1
    prvIPTask started
    vIPSetDHCP_RATimerEnableState: Off
    prvCloseDHCPSocket[fe-7a]: closed, user count 0
    vDHCPProcessEndPoint: enter 0
    DHCP-socket[fe-7a]: DHCP Socket Create
    prvCreateDHCPSocket[fe-7a]: open, user count 1
    prvInitialiseDHCP: start after 25 ticks
    vDHCP_RATimerReload: 25
    vDHCPProcessEndPoint: exit 1
    vDHCPProcessEndPoint: enter 1
    vDHCPProcess: discover
    vDHCPProcessEndPoint: exit 2
    vDHCPProcessEndPoint: enter 2
    vDHCPProcess: discover
    vDHCPProcess: timeout 1000 ticks
    vDHCPProcess: offer ac1d0761ip for MAC address fe-7a
    vDHCPProcess: reply ac1d0761ip
    vDHCPProcessEndPoint: exit 3
    vDHCPProcessEndPoint: enter 3
    vDHCPProcess: offer ac1d0761ip for MAC address fe-7a
    vDHCPProcess: acked ac1d0761ip
    prvCloseDHCPSocket[fe-7a]: closed, user count 0
    vDHCP_RATimerReload: 4320000
    vDHCPProcessEndPoint: exit 5
    
  • Immediately thereafter, the system does a gratuitous ARP announcement as duplicate check:

    02:14:56.282088 3a:30:25:24:fe:7a > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 172.29.7.97 tell 172.29.7.97, length 46
    

    No response is received, as no duplicate exists on the network.

  • The application now attempts to send a UDP packet (SNTP, specifically, using coreSNTP). No packets, UDP or ARP or otherwise, are emitted and the system logs the following:

    SNTP Info: Sending serialized SNTP request packet to the server: Addr=2214620366, Port=123
    FreeRTOS_FindEndPointOnNetMask[4]: No match for ce6c0084ip
    ARP ce6c0084ip miss using ac1d0701ip
    FreeRTOS_FindEndPointOnNetMask[11]: No match for ce6c0084ip 
    

    Note in particular that the 2nd FindEndPointOnNetMask call is for the same, off-segment address as the first! (Some additional instrumentation shows that the sendto has returned the expected 48, indicating success, which is a bit rude.) coreSNTP eventually times out and reports failure to the application, which goes to sleep (before retrying).

  • While the application is asleep, the gateway sends an ARP request to refresh its cache entry for the system, and the system responds:

    02:15:01.393897 b4:fb:e4:20:ca:6b > 3a:30:25:24:fe:7a, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 172.29.7.97 tell 172.29.7.1, length 46
    02:15:01.398879 3a:30:25:24:fe:7a > b4:fb:e4:20:ca:6b, ethertype ARP (0x0806), length 64: Ethernet (len 6), IPv4 (len 4), Reply 172.29.7.97 is-at 3a:30:25:24:fe:7a, length 50
    

    The system logs this and the insertion into its ARP cache:

    pxEasyFit: ARP ac1d0701ip -> ac1d0761ip
    ipARP_REQUEST from ac1d0701ip to ac1d0761ip end-point ac1d0761ip
    

    This happens only because the gateway and the DHCP server are one and the same. Were the gateway a different node, it might never issue an ARP request for the system.

  • The application wakes up and retries SNTP. At this point, there is a hit in the ARP cache and a packet is sent:

    02:15:06.321627 3a:30:25:24:fe:7a > b4:fb:e4:20:ca:6b, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 76)
        172.29.7.97.15848 > 149.56.19.163.123: [no cksum] NTPv4, Client, length 48
    

    The system logs:

    SNTP Info: Sending serialized SNTP request packet to the server: Addr=2214620366, Port=123
    FreeRTOS_FindEndPointOnNetMask[4]: No match for ce6c0084ip
    ARP ce6c0084ip hit using ac1d0701ip
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions