Fix W5100S socket exhaustion blocking MQTT and additional TCP clients#9770
Fix W5100S socket exhaustion blocking MQTT and additional TCP clients#9770PhilipLykov wants to merge 6 commits intomeshtastic:developfrom
Conversation
The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631 Ethernet gateways with syslog and NTP enabled, all 4 sockets were permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP API client), leaving none for MQTT, DHCP lease renewal, or additional TCP connections. - NTP: Remove permanent timeClient.begin() at startup; NTPClient::update() auto-initializes when needed. Add timeClient.end() after each query to release the UDP socket immediately. - Syslog: Remove socket allocation from Syslog::enable(). Open and close the UDP socket on-demand in _sendLog() around each message send. - MQTT: Fix socket leak in isValidConfig() where a successful test connection was never closed (PubSubClient destructor does not call disconnect). Add explicit pubSub->disconnect() before returning. Made-with: Cursor
| delete[] message; | ||
| message = new char[len + 1]; | ||
|
|
||
| vsnprintf(message, len + 1, fmt, args); |
There was a problem hiding this comment.
args is reused without va_copy(); reusing a traversed va_list is undefined behavior and can produce corrupted logs or crashes
| @@ -193,6 +199,8 @@ | |||
| this->_client->print(message); | |||
| this->_client->endPacket(); | |||
There was a problem hiding this comment.
return value of endPacket() is ignored; true is returned unconditionally. On embedded Ethernet stacks, endPacket() is the point where TX failure is reported; this currently reports success even when the datagram send fails.
src/mqtt/MQTT.cpp
Outdated
| return connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection); | ||
| bool result = connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection); | ||
| if (client == nullptr) { | ||
| pubSub->disconnect(); |
There was a problem hiding this comment.
Socket cleanup is conditional on client == nullptr; when a non-null client is passed, successful validation can still leave the connection open. Since this function is a validator, that side effect can leak scarce sockets on Ethernet-class devices. Consider ensuring cleanup for all validation paths (or explicitly documenting and renaming if persistent connection is intended).
|
@robekl please stop using LLMs to review random PRs. We have CoPilot for this. Additionally the reviews are also commenting on existing unchanged code. This is your one and only warning. |
|
@robekl Thanks for the review.
|
Resolve conflict in MQTT.cpp: take upstream's rewritten isValidConfig() which replaced connectPubSub() validation with a lightweight TCP-only MQTTClient check. The upstream approach inherently fixes the socket leak our PR addressed (testClient.stop() is always called), and adds user-visible warnings via clientNotification. Made-with: Cursor
|
Resolved the merge conflict in The other two fixes in this PR are still needed and unaffected by upstream:
Without these two changes, the W5100S still exhausts all 4 hardware sockets (NTP UDP + Syslog UDP + TCP listener + TCP client), leaving none for MQTT, DHCP renewal, or additional connections. |
Summary
Fix W5100S socket exhaustion on RAK4631 Ethernet gateways that prevents MQTT connections, DHCP lease renewal, and additional TCP API clients.
The W5100S chip has only 4 hardware sockets. With NTP and Syslog enabled, all 4 sockets were permanently consumed:
This left zero sockets for MQTT, DHCP lease renewal, or NTP updates — causing MQTT to silently fail with no packets sent on the wire.
Changes
ethClient.cpp): Remove permanenttimeClient.begin()at startup.NTPClient::update()auto-initializes when_udpSetupis false. AddtimeClient.end()after each query to release the socket immediately.DebugConfiguration.cpp): Remove socket allocation fromSyslog::enable(). Open and close the UDP socket in_sendLog()around each message send. Syslog overhead is ~320μs per message for socket open/close, well under 1% CPU at typical log rates.MQTT.cpp): FixisValidConfig()where a successful test connection was never closed —PubSubClient::~PubSubClient()only frees buffers, it does not calldisconnect()or_client->stop(). The leaked socket in ESTABLISHED state permanently consumed a W5100S slot.After fix
Test plan