Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions wled00/asyncDNS.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#pragma once
/*
asyncDNS.h - wrapper class for asynchronous DNS lookups using lwIP
by @dedehai
*/

#ifndef ASYNC_DNS_H
#define ASYNC_DNS_H

#include <Arduino.h>
#include <lwip/dns.h>
#include <lwip/err.h>

enum class DnsResult { Idle, Busy, Success, Error };

class AsyncDNS {
private:
ip_addr_t _raw_addr;
volatile DnsResult _status = DnsResult::Idle;
uint16_t _errorcount = 0;

// callback for dns_gethostbyname(), called when lookup is complete or timed out
static void _dns_callback(const char *name, const ip_addr_t *ipaddr, void *arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is destructed while a query is still outstanding, this will result in the arg pointer becoming invalid, resulting in memory corruption when the callback is eventually executed. Disconnecting from the wifi does not dequeue DNS callbacks -- they're supposedly guaranteed to be called "eventually" by the lwip stack.

Fixing this is nontrivial -- the state data needs to be stored on the heap separately from the interface class, and scoped with tools like std::shared_ptr so it won't be released until the callback is run. I made a little sketch of this, I can add it to the PR if you'd like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I am not opposed to "do it properly" but I also see two more trivial solutions:

  • make the class global and instantiate only one, add a "key" or a char array and share the class between callers. this adds the complexity of a shared resource though and probably has implications I am currently not thinking of.
  • simplest: do not delete if status is busy (works well for NTP use)
    both of these will not solve the issue for virtual buses though - maybe adding another function to the class that is actually blocking would simplify its use for places that "need an IP immediately" - what I have implemented in busmanager is way too complex IMHO but mostly due to the fact that it needs to dance around that bug gracefully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just deletion: renew() and reset() are also invalid if status == Busy as the results of a new query will be overwritten by the old callback at some point. If a query is stuck, the state object has to be abandoned -- there's no real alternative.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it the way I did because I never got it to not run the callback - you are right it is not as robust as it could be but it was not my intention to write it "fool proof" i.e. reset while a request is going on.
So which one do you prefer:

  • using seperate heap for the callback as you propose
  • not use a pointer and make it fully static (like I did now in the busmanager) and not go down the road of managing dangling pointers?
    to me the option of not using it dynamically makes sense, at least ATM as DNS is only used for NTP and virtual bus. The class only uses a few bytes of RAM.

Copy link
Member

@willmmiles willmmiles Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to test/treat an AsyncDNS object as a unit, with safety by design -- "it doesn't crash because we don't happen to delete/reuse it while it's active" isn't good enough IMO, static or otherwise. Eventually that will bite us because the class violates the C++ object requirement that a destructor is always safe.

I took a stab at safety, while keeping the current object and re-use semantics:
willmmiles@26c202a

I think it might be done better with std::future<> style semantics though; the code would be closer to the current class design. Eg. the fundamental call is:

class AsyncDNS {
   static std::shared_ptr<AsyncDNS> query(const char* hostname, std::shared_ptr<AsyncDNS> previous = {});

   private:
       AsyncDNS();   // Can only be produced by making a query
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say that I'm having a lot of trouble with DNS hanging on my test system. Do you have any references on that IDF v4 bug you mention?

I do too, that is why I am pretty sure its a bug, it just locks up and does not recover until reconnect. I did not really find concret info about it but the changelog mentions some fix for DNS deadlock: https://github.com/espressif/esp-idf/releases?page=1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

that is a whole lot of fancy c++ magic I do not (yet) understand. Is it to make it multi-thread safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

that is a whole lot of fancy c++ magic I do not (yet) understand. Is it to make it multi-thread safe?

Sort of. The goal is to make sure that the addresses that will be touched by the callback in the future cannot under any circumstances be released until the callback has run. std::shared_ptr<> is a reference-counting pointer: it keeps track of how many shared_ptrs address the same storage, so the pointed-to object won't be deleted until the last shared_ptr pointing at it is removed. shared_ptr guarantees that the reference count is atomic and so it is thread-safe in this context.

The first implementation hides the shared_ptr by making the AsyncDNS class a wrapper around it: the only member is a shared_ptr to the DNS results. A copy of the shared_ptr is "gifted" to the callback, who is then guaranteed to be able to write the state. If the AsyncDNS object is destroyed, or the request is abandoned via renew() or reset(), that copy keeps the DNS result struct alive until it's finished with it.

The second implementations strips off the wrapper layer, so clients always have to store a shared_ptr<AsyncDNS>. This is intended to make the behaviour to the client more clear, ie. the thing you hold is a handle to the real object (and removes an extra layer of code). The major change to make that work is the interface to query(): it lets a client pass in a shared_ptr<AsyncDNS> that's been created before, so it can re-use the context (ie. error count) or storage if that request has been completed. (If the request is still busy, or if you pass in an empty pointer, query() makes a new storage object for you.)

I liken the second approach to the use of std::promise<>/std::future<>. Promises and futures are part of the C++11 async library. You can think of them as a thread-safe single result queue: the end holding the std::promise can write a single result value in a safe way, such that the std::future can wait() for it and read it later. Unfortunately actual std::promise and std::future are quite heavyweight, particularly with their use of exceptions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to your judgement. I would keep it as light-weight as possible. Feel free to add to this PR.

AsyncDNS* instance = static_cast<AsyncDNS*>(arg);
if (ipaddr) {
instance->_raw_addr = *ipaddr;
instance->_status = DnsResult::Success;
} else {
instance->_status = DnsResult::Error; // note: if query timed out (~5s), DNS lookup is broken until WiFi connection is reset (IDF V4 bug)
instance->_errorcount++;
}
}

public:
// note: passing the IP as a pointer to query() is not implemented because it is not thread-safe without mutexes
// with the current IDF bug external error handlig is required anyway or dns will just stay stuck

// non-blocking query function to start DNS lookup
DnsResult query(const char* hostname) {
if (_status == DnsResult::Busy) return DnsResult::Busy; // in progress, waiting for callback

_status = DnsResult::Busy;
err_t err = dns_gethostbyname(hostname, &_raw_addr, _dns_callback, this);

if (err == ERR_OK) {
_status = DnsResult::Success; // result already in cache
} else if (err != ERR_INPROGRESS) {
_status = DnsResult::Error;
_errorcount++;
}
return _status;
}

// get the IP once Success is returned
IPAddress getIP() {
if (_status != DnsResult::Success) return IPAddress(0,0,0,0);
#ifdef ARDUINO_ARCH_ESP32
return IPAddress(_raw_addr.u_addr.ip4.addr);
#else
return IPAddress(_raw_addr.addr);
#endif
}

void renew() { _status = DnsResult::Idle; } // reset status to allow re-query
void reset() { _status = DnsResult::Idle; _errorcount = 0; } // reset status and error count
DnsResult status() { return _status; }
uint16_t getErrorCount() { return _errorcount; }
};
#endif // ASYNC_DNS_H
46 changes: 40 additions & 6 deletions wled00/bus_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,7 @@ BusNetwork::BusNetwork(const BusConfig &bc)
_hasCCT = false;
_UDPchannels = _hasWhite + 3;
_client = IPAddress(bc.pins[0],bc.pins[1],bc.pins[2],bc.pins[3]);
_DNSlookup = nullptr;
#ifdef ARDUINO_ARCH_ESP32
_hostname = bc.text;
resolveHostname(); // resolve hostname to IP address if needed
Expand Down Expand Up @@ -713,12 +714,45 @@ size_t BusNetwork::getPins(uint8_t* pinArray) const {
#ifdef ARDUINO_ARCH_ESP32
void BusNetwork::resolveHostname() {
static unsigned long nextResolve = 0;
if (Network.isConnected() && millis() > nextResolve && _hostname.length() > 0) {
nextResolve = millis() + 600000; // resolve only every 10 minutes
IPAddress clnt;
if (strlen(cmDNS) > 0) clnt = MDNS.queryHost(_hostname);
else WiFi.hostByName(_hostname.c_str(), clnt);
if (clnt != IPAddress()) _client = clnt;
if (Network.isConnected()) {
if (_DNSlookup != nullptr) {
if (_DNSlookup->status() == DnsResult::Success) {
_client = _DNSlookup->getIP();
delete _DNSlookup;
_DNSlookup = nullptr;
}
}
if (millis() > nextResolve && _hostname.length() > 0) {
nextResolve = millis() + 600000; // resolve only every 10 minutes
IPAddress clnt;
if (strlen(cmDNS) > 0) {
clnt = MDNS.queryHost(_hostname);
if (clnt != IPAddress()) _client = clnt; // update client IP if not null
}
else {
if (_DNSlookup == nullptr) {
_DNSlookup = new AsyncDNS;
_DNSlookup->query(_hostname.c_str()); // start async DNS query
} else {
// there is a previous, unresolved query (should be in error state)
// TODO: all this error handling is only required because of an IDF bug, otherwise we could just query again later, should be fixed in V5
if (_DNSlookup->status() == DnsResult::Error) {
if (_DNSlookup->getErrorCount() <= 2) {
_DNSlookup->renew(); // clear error state, keep error count
_DNSlookup->query(_hostname.c_str()); // restart async DNS query
} else {
forceReconnect = true; // reset network connection as dns is probably stuck
delete _DNSlookup;
_DNSlookup = nullptr;
}
}
else {
delete _DNSlookup; // cleanup if other error just in case (should not happen)
_DNSlookup = nullptr;
}
}
}
}
}
}
#endif
Expand Down
6 changes: 6 additions & 0 deletions wled00/bus_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
#include "pin_manager.h"
#include <vector>
#include <memory>
#ifdef ARDUINO_ARCH_ESP32
#include "asyncDNS.h"
#endif

#if __cplusplus >= 201402L
using std::make_unique;
Expand Down Expand Up @@ -364,6 +367,9 @@ class BusNetwork : public Bus {

private:
IPAddress _client;
#ifdef ARDUINO_ARCH_ESP32
AsyncDNS* _DNSlookup;
#endif
uint8_t _UDPtype;
uint8_t _UDPchannels;
bool _broadcastLock;
Expand Down
2 changes: 1 addition & 1 deletion wled00/cfg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ bool deserializeConfig(JsonObject doc, bool fromFS) {
getStringFromJson(apPass, ap["psk"] , 65); //normally not present due to security
//int ap_pskl = ap[F("pskl")];
CJSON(apChannel, ap[F("chan")]);
if (apChannel > 13 || apChannel < 1) apChannel = 1;
if (apChannel > 13 || apChannel < 1) apChannel = 6; // reset to default if invalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seams unrelated to the PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is. I should have changed this line in #5115 but was unaware at the time.

CJSON(apHide, ap[F("hide")]);
if (apHide > 1) apHide = 1;
CJSON(apBehavior, ap[F("behav")]);
Expand Down
51 changes: 41 additions & 10 deletions wled00/ntp.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "src/dependencies/timezone/Timezone.h"
#include "wled.h"
#include "fcn_declare.h"
#include "asyncDNS.h"

// WARNING: may cause errors in sunset calculations on ESP8266, see #3400
// building with `-D WLED_USE_REAL_MATH` will prevent those errors at the expense of flash and RAM
Expand Down Expand Up @@ -180,6 +181,7 @@ void handleTime() {
}
}

AsyncDNS* ntpDNSlookup;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::unique_ptr instead of naked pointers. Also it should be static inside the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unique_ptr gives me errors as make_unique is not defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a shim in bus_manager.h. (This might be moot re fixing the safety issues though -- shared_ptr might allow a cleaner solution.)

void handleNetworkTime()
{
if (ntpEnabled && ntpConnected && millis() - ntpLastSyncTime > (1000*NTP_SYNC_INTERVAL) && WLED_CONNECTED)
Expand All @@ -189,26 +191,54 @@ void handleNetworkTime()
#ifdef ARDUINO_ARCH_ESP32 // I had problems using udp.flush() on 8266
while (ntpUdp.parsePacket() > 0) ntpUdp.flush(); // flush any existing packets
#endif
sendNTPPacket();
ntpPacketSentTime = millis();
if (!ntpServerIP.fromString(ntpServerName)) // check if server is IP or domain
{
if (ntpDNSlookup == nullptr) ntpDNSlookup = new AsyncDNS;
DnsResult res = ntpDNSlookup->status();
switch (res) {
case DnsResult::Idle:
//DEBUG_PRINTF_P(PSTR("Resolving NTP server name: %s\n"), ntpServerName);
Serial.printf_P(PSTR("Resolving NTP server name: %s\n"), ntpServerName);
ntpDNSlookup->query(ntpServerName); // start dnslookup asynchronously
return;

case DnsResult::Busy:
return; // still in progress

case DnsResult::Success:
DEBUG_PRINTF_P(PSTR("NTP IP resolved: %s\n"), ntpServerIP.toString().c_str());
Serial.printf_P(PSTR("NTP IP resolved: %s\n"), ntpServerIP.toString().c_str());
ntpServerIP = ntpDNSlookup->getIP();
sendNTPPacket();
break;

case DnsResult::Error:
DEBUG_PRINTLN(F("NTP DNS failed"));
ntpDNSlookup->renew(); // try a new lookup next time
if ((toki.getTimeSource() == TOKI_TS_NONE) || ntpDNSlookup->getErrorCount() >= 3) {
// no time source or after 3 failed attempts (30min), reset network connection as dns is probably stuck (TODO: IDF bug, should be fixed in V5)
if (!realtimeMode) forceReconnect = true; // do not disturb streaming clients
delete ntpDNSlookup;
ntpDNSlookup = nullptr;
}
ntpLastSyncTime = toki.getTimeSource() == TOKI_TS_NONE ? NTP_NEVER : millis() - (1000*NTP_SYNC_INTERVAL - 600000); // keep trying if no time source, otherwise pause for 10 minutes
break;
}
}
else
sendNTPPacket();
}
if (checkNTPResponse())
{
ntpLastSyncTime = millis();
delete ntpDNSlookup;
ntpDNSlookup = nullptr;
}
}
}

void sendNTPPacket()
{
if (!ntpServerIP.fromString(ntpServerName)) //see if server is IP or domain
{
#ifdef ESP8266
WiFi.hostByName(ntpServerName, ntpServerIP, 750);
#else
WiFi.hostByName(ntpServerName, ntpServerIP);
#endif
}

DEBUG_PRINTLN(F("send NTP"));
byte pbuf[NTP_PACKET_SIZE];
Expand All @@ -227,6 +257,7 @@ void sendNTPPacket()
ntpUdp.beginPacket(ntpServerIP, 123); //NTP requests are to port 123
ntpUdp.write(pbuf, NTP_PACKET_SIZE);
ntpUdp.endPacket();
ntpPacketSentTime = millis();
}

static bool isValidNtpResponse(const byte* ntpPacket) {
Expand Down
Loading