GT-BE98 (AU): Frequent WAN DHCP renewal failures with relay-based ISP — "ISP's DHCP did not function properly"

## Router Model Affected

GT-BE98 (AU version)
**ISP: SuperLoop**

## Firmware Version Affected

3006.102.6_1-gnuton1

## Is this bug present in upstream Merlin releases too?

Unknown — I have not tested upstream Merlin on this model. However, this issue has been reproduced on **two separate GT-BE98 AU units** with the same firmware, ruling out hardware defect.

## Describe the bug

The router experiences frequent WAN DHCP renewal failures, causing periodic internet outages lasting 3–4 minutes each(Some times a few hours). The WebUI displays "ISP's DHCP did not function properly" during these events.

**Key facts:**
- The same ISP connection (NBN FTTB, static IP, 600-second lease) works **flawlessly with a different router** — zero DHCP errors ever
- The problem reproduces on **two different GT-BE98 AU units** — not a hardware issue
- The outages occur multiple times per day, at irregular intervals

### Root Cause Analysis

After extensive investigation, I identified that the ISP uses a **DHCP relay architecture** where the DHCP server is on a completely different subnet from the WAN:

```
WAN IP:           x.x.x.177
WAN Subnet:       x.x.x.0/24
DHCP Server ID:   114.129.184.1     ← NOT in WAN subnet
DHCP Relay:       172.21.68.206     ← NOT in WAN subnet
```

This is extracted from `/tmp/wan0_bound.env`:
```
interface=vlan4094
ip=x.x.x.177
siaddr=172.21.68.206
subnet=255.255.255.0
router=x.x.x.1
dns=119.40.106.35 119.40.106.36
lease=600
serverid=114.129.184.1
```

Three firmware behaviors interact to may cause renewal failures(**From AI, I do not know if it works**):

### 1. udhcpc is launched with very conservative retry parameters

The DHCP client is started with:
```
/sbin/udhcpc -i vlan4094 -p /var/run/udhcpc0.pid -s /tmp/udhcpc_wan -t2 -T5 -A160 -O33 -O249
```

- `-t2` = only **2 retries** per renewal attempt
- `-T5` = 5 second timeout per retry
- `-A160` = **160 second delay** before retrying after failure

With a 600-second lease, the renewal timeline is extremely tight:
- T/2 renewal at ~300s → 2 retries (10s window) → if fails, waits 160s
- Next attempt at ~470s → 2 retries → if fails, waits 160s
- Next would be ~640s but **lease already expired at 600s**

### 2. `wan0_dhcpfilter_enable=1` may silently drop valid DHCP ACKs

The DHCP filter validates incoming DHCP responses against an expected gateway MAC:
```
wan0_dhcpfilter_enable=1
wan0_gw_mac=EA:02:FE:A1:01:53
```

Since the ISP uses a DHCP relay, the DHCP ACK may arrive from a different source MAC than the gateway. If the MAC doesn't match, the filter silently drops the response, and the renewal appears to fail from udhcpc's perspective.

### 3. `wan0_dhcp_qry=0` uses unicast renewals

With `dhcp_qry=0`, udhcpc sends unicast renewal requests to the DHCP server at `114.129.184.1`. Since this address is **not in the WAN subnet** (`x.x.x.0/24`), the unicast must route through the ISP gateway. Any momentary routing disruption causes the unicast to be lost.

## Failure Sequence (from syslog)

Here is a typical failure cycle captured from `/tmp/syslog.log`:

```
Mar  7 14:24:37 dhcp_client: bound x.x.x.177/255.255.255.0 via x.x.x.1 for 600 seconds.

[~30 minutes pass — no visible renewal events in syslog — renewal failed silently]

Mar  7 14:54:35 rc_service: wanduck 3052:notify_rc restart_wan_if 0
Mar  7 14:54:35 dhcp_client: deconfig
Mar  7 14:54:44 rc_service: udhcpc_wan 31227:notify_rc restart_wan_if 0
Mar  7 14:54:44 rc_service: udhcpc_wan 31227:notify_rc restart_apg_eth_vlan
Mar  7 14:54:44 rc_service: waitting "restart_wan_if 0"(last_rc:restart_autowan) via udhcpc_wan ...
Mar  7 14:54:48 dhcp_client: deconfig
Mar  7 14:57:46 udhcpc_wan: hnd_get_phy_status: Temporarily Router cannot get the PHY() status...
Mar  7 14:57:55 rc_service: udhcpc_wan 1440:notify_rc start_dnsmasq 255
Mar  7 14:58:06 dhcp_client: bound x.x.x.177/255.255.255.0 via x.x.x.1 for 600 seconds.
```

**Total outage: ~3.5 minutes** (14:54:35 → 14:58:06)

Another occurrence from earlier the same day:
```
Mar  7 07:11:21 rc_service: wanduck 3052:notify_rc restart_wan_if 0
Mar  7 07:11:27 dnsmasq-dhcp[29204]: DHCP, IP range 192.168.50.2 -- 192.168.50.254, lease time 1d
Mar  7 07:14:32 udhcpc_wan: hnd_get_phy_status: Temporarily Router cannot get the PHY() status...
Mar  7 07:14:41 rc_service: udhcpc_wan 31419:notify_rc start_dnsmasq 255

Mar  7 07:27:07 rc_service: wanduck 3052:notify_rc restart_wan_if 0
Mar  7 07:27:16 rc_service: udhcpc_wan 5271:notify_rc restart_wan_if 0
Mar  7 07:30:17 udhcpc_wan: hnd_get_phy_status: Temporarily Router cannot get the PHY() status...
Mar  7 07:30:38 wan_up: Restart DDNS

Mar  7 12:52:52 rc_service: wanduck 3052:notify_rc restart_wan_if 0
Mar  7 12:53:01 rc_service: autowan 29364:notify_rc restart_wan_if 0
Mar  7 12:56:01 udhcpc_wan: hnd_get_phy_status: Temporarily Router cannot get the PHY() status...
Mar  7 12:56:19 wan_up: Restart DDNS
```

**Three WAN restart events in a single day** (07:11, 07:27, 12:52), each triggered by `wanduck` detecting connectivity loss after a silent DHCP renewal failure.

## To Reproduce

1. Use a GT-BE98 (AU) with firmware `3006.102.6_1-gnuton1`
2. Connect to an ISP that uses DHCP relay (DHCP server not in the WAN subnet)
3. ISP provides a short DHCP lease (600 seconds in my case)
4. Leave the router running — within hours, `wanduck` will trigger `restart_wan_if` due to lease expiry
5. The WebUI will show "ISP's DHCP did not function properly"

**Note:** This is more likely to occur when (**From AI VIEW**):
- The ISP DHCP server is on a different subnet than the WAN gateway
- The DHCP lease is short (≤ 1 hour)
- `wan0_dhcpfilter_enable=1` (default)
- `wan0_dhcp_qry=0` (default, unicast renewals)

## Expected behavior

DHCP lease renewals should succeed reliably, even with short leases and relay-based ISP DHCP servers.

## Environment Details

```
Router: ASUS GT-BE98 (AU)
Firmware: 3006.102.6_1-gnuton1
Kernel: Linux 4.19.294 aarch64
BusyBox: v1.25.1
ISP: SuperLoop
WAN Interface: vlan4094 (on eth1, 2.5G port)
WAN Proto: DHCP
DHCP Lease: 600 seconds
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GT-BE98 (AU): Frequent WAN DHCP renewal failures with relay-based ISP — "ISP's DHCP did not function properly" #927

Router Model Affected

Firmware Version Affected

Is this bug present in upstream Merlin releases too?

Describe the bug

Root Cause Analysis

1. udhcpc is launched with very conservative retry parameters

2. `wan0_dhcpfilter_enable=1` may silently drop valid DHCP ACKs

3. `wan0_dhcp_qry=0` uses unicast renewals

Failure Sequence (from syslog)

To Reproduce

Expected behavior

Environment Details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GT-BE98 (AU): Frequent WAN DHCP renewal failures with relay-based ISP — "ISP's DHCP did not function properly" #927

Description

Router Model Affected

Firmware Version Affected

Is this bug present in upstream Merlin releases too?

Describe the bug

Root Cause Analysis

1. udhcpc is launched with very conservative retry parameters

2. wan0_dhcpfilter_enable=1 may silently drop valid DHCP ACKs

3. wan0_dhcp_qry=0 uses unicast renewals

Failure Sequence (from syslog)

To Reproduce

Expected behavior

Environment Details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

2. `wan0_dhcpfilter_enable=1` may silently drop valid DHCP ACKs

3. `wan0_dhcp_qry=0` uses unicast renewals