-
Notifications
You must be signed in to change notification settings - Fork 66
Introduce AsyncURIMatcher class to encapsulate the URI matching logic with and without regex support (-D ASYNCWEBSERVER_REGEX=1) #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3bb0a05
to
596610d
Compare
ec1aed9
to
765b747
Compare
@willmmiles @me-no-dev : quick update. I have pushed again to remove URIMatcher support for static file handler because I saw when trying to write examples that supporting regex is useless unless we completely change the way this thing is implemented. it requires a non regex path to find the file on disk. That being say, this leaves us with 2 handlers supporting regex:
I do not think this is worth supporting regex for WebSocket and EventSource which currently match the exact URI. So for these 2 use cases, we can easily put in common the way to match requests because json handler is just a specialization of the default one handling automatically a specific content type. When not using regex, In main, json handler matches:
When not using regex, In main, the default handler matches:
So 1,2,3 are all common, no worry for them. 4 and 5 are cases applied (checked) before 3 in the default handler. 3 looks weird to me. I would have expected a handler to be set at CONCLUSION: I have updated the matcher: // empty URI matches everything
if (!_value.length()) {
return true;
}
String path = request->url();
if (_ignoreCase) {
path.toLowerCase();
}
// exact match (should be the most common case)
if (_value == path) {
return true;
}
// wildcard match with * at the end
if (_value.endsWith("*")) {
return path.startsWith(_value.substring(0, _value.length() - 1));
}
// prefix match with /*.ext
// matches any path ending with .ext
// e.g. /images/*.png will match /images/pic.png and /images/2023/pic.png but not /img/pic.png
if (_value.startsWith("/*.")) {
return path.endsWith(_value.substring(_value.lastIndexOf(".")));
}
// finally check for prefix match with / at the end
// e.g. /images will also match /images/pic.png and /images/2023/pic.png but not /img/pic.png
if (path.startsWith(_value + "/")) {
return true;
}
// we did not match
return false; I think this is backward compatible (to be tested of course). Matches the current behavior of default handler and adds supprot for * and /* matching in json one. I also added ignoreCase support. About rewrites: I initially had the idea to apply the logic to rewrites but we cannot since it won't be backward compatible for 3 things
About memory usage I know some users have A LOT of endpoints. With this new class, a handler will always have an instance of URIMatcher. So this is really important that we do not extend the memory footprint too much by addign too many fields. So if we want to support regex and ignore case, it means we have to use different classes, eventually overrides with a combination of templates, like the ones suggested by Will above. |
e14655a
to
9ef3718
Compare
On the one hand, changing the way However! I can also make an argument that any such path matching voodoo can be done today through TL;DR: I concur that we can leave
I was very much hoping to start moving towards explicit match behavioural flags in the new matcher, and away from magic URL string syntax. Particularly since some of those cases probably should be mutually exclusive: the folder-suffix case is particularly dangerous when I mean to require only an exact match. I was thinking something like: class AsyncURIMatcher {
public:
enum MatchFlags {
MatchAll = (1<<0),
MatchExact = (1<<1), // matches equivalent to regex: ^{_uri}$
MatchPrefixFolder = (1<<2), // matches equivalent to regex: ^{_uri}/.*
MatchPrefixAll = (1<<3), // matches equivalent to regex: ^{_uri}.*
MatchExtension = (1<<4), // matches equivalent to regex: \.{_uri}$
#ifdef ASYNCWEBSERVER_REGEX
MatchRegex = (1<<5), // matches _url as regex
#endif
MatchAuto = (1<<30), // parse _uri at construct time and infer match type(s)
// (_uri may be transformed to remove wildcards)
MatchCaseInsensitive = (1<<31)
};
AsyncURIMatcher(String uri, int32_t flags = MatchAuto);
// ...
bool matches(AsyncWebServerRequest* request) {
if (_flags & MatchAll) return true;
String path = request->url();
if (_flags & MatchCaseInsensitive) {
path.toLowerCase();
}
// exact match (should be the most common case)
if ((_flags & MatchExact) && (_uri == path)) {
return true;
}
// .. etc
}
} Does that make sense? Then users can construct exactly the match logic they want, state is kept to a minimum, and we can also keep "Auto" for backwards compatibility as "parse the uri like we always did". (Also we don't have to re-parse the uri for matching on every request!) |
I have some changes locally, not matching that but going in nearly the same direction I suppose. What I saw when continuing using one urimatcher class is that it will increase the memory used for apps having a lot of handlers. So I was searching for a way to avoid that and found no alternative but to use an interface with 3 implementations: case sensitive one, case incentive one and regex one. i overloaded the on() methods also. another alternative would be to keep the current behavior for const char* uri parans but allow some different Marc pattern depending on the parameter. I really try in my local changes to not have a uri object with more than 1 field because it would mean more memory usage. |
No matter how we slice it, offering any kind of match type options will require us to store what they are. Using overloading means we pay for storing what match logic to run with a vtable pointer instead of a flags member, but we still pay for it. Taken to the limit (ie. explicitly specified matching semantics) we might also end up paying for it with more code space instead as well. Plus we'd have to switch to using indirection to the AsyncURIMatcher instance in the Handler class, so we would also pay the cost for yet another pointer. If memory size is the biggest concern, I'd go with a flags word and re-constructing the Although vtables do also have the advantage that we might be able to arrange the regex implementation so as to avoid needing the Storing the constructed But! If we really want to get in to the realm of RAM optimization, one technique would be to overload the flags word as a |
I agree we have to decide on a tradeoff. For example, I was looking to introduce a case insensitive matcher. A lot of websites are case insensitive. And just that requires a boolean to be stored with the uri value if we do not use polymorphism. For example if we go without an interface, how users would be able to implement a case insensitive match without regexp ? Or even if we do not and one day we do want case insensitive match, then we have an object with 2 fields eventually where we try to now put a third one. I know we pay a cost in every solution, but to me it seems that if there is a cost to pay somewhere, then be it in a way that things can be extensible, not closed, and open for future evolution ? I will try to push this evening where I was with these changes regarding inheritance so that you can see the idea. I went there because I was not able within 1 UriMatcher object to support regex + standard match + case insensitive match and now like discussed above eventually several flavors of match (which is then another flag on top of that). At the end users could ask the moon, like we do now, that's why maybe providing an interface could be an option. |
With the sketch above, all the various options, including case sensitivity, were packed in to a single 32-bit flag member, with many bits to spare. That solution could cover all cases with a total cost of 1 DWORD per
Because we pay for that extensibility today, even if we're not using it yet. Using a vtable solution has a minimum additional cost of 4 DWORDs per
Sounds good. I'll see if I can sketch the solution I'm proposing above - it's really not that far from where you're at with 9ef3718. |
9ef3718
to
c27e1d0
Compare
Thanks! I have updated #310 and rebased this PR on top of it. I am happy with the PR as it is because it is simple and supports regex + url matching like before + case insensitive. The only thing I really don't like is this big AsyncURIMatcher object. I know users have more than a hundred of endpoints (don't ask me why), but it means they will be subject to some decrease of ram with so many AsyncURIMatcher objects in memory, especially if they are using regex for 1 or 2 endpoints they end up having 98 unused regex objects. So that's why I tried today a polymorphism version, that I've pushed in branch https://github.com/ESP32Async/ESPAsyncWebServer/tree/urimatcher-poly. In this branch pretty much only the URIMatcher classes change plus the I ended up with this hierarchy to limit the quantity of fiels, but like you say it also introduce an overhead for the polymorphism. But this is a quite constant overhead right ? For users having a LOT of handlers, I guess this is still better than having hundreds of unused regex or boolean objects ? class AsyncURIMatcher {
public:
AsyncURIMatcher() {}
AsyncURIMatcher(const AsyncURIMatcher &) = default;
AsyncURIMatcher(AsyncURIMatcher &&) = default;
virtual ~AsyncURIMatcher() = default;
AsyncURIMatcher &operator=(const AsyncURIMatcher &) = default;
AsyncURIMatcher &operator=(AsyncURIMatcher &&) = default;
virtual bool matches(AsyncWebServerRequest *request) const { return false; };
};
class AsyncCaseSensitiveURIMatcher : public AsyncURIMatcher {
public:
AsyncCaseSensitiveURIMatcher() {}
AsyncCaseSensitiveURIMatcher(const char *uri) : _value(uri) {}
AsyncCaseSensitiveURIMatcher(String uri) : _value(std::move(uri)) {}
AsyncCaseSensitiveURIMatcher(const AsyncCaseSensitiveURIMatcher &) = default;
AsyncCaseSensitiveURIMatcher(AsyncCaseSensitiveURIMatcher &&) = default;
virtual ~AsyncCaseSensitiveURIMatcher() override = default;
AsyncCaseSensitiveURIMatcher &operator=(const AsyncCaseSensitiveURIMatcher &) = default;
AsyncCaseSensitiveURIMatcher &operator=(AsyncCaseSensitiveURIMatcher &&) = default;
virtual bool matches(AsyncWebServerRequest *request) const override {
return pathMatches(request->url());
}
protected:
String _value;
bool pathMatches(const String &path) const {
// empty URI matches everything
if (!_value.length()) {
return true;
}
// exact match (should be the most common case)
if (_value == path) {
return true;
}
// wildcard match with * at the end
if (_value.endsWith("*")) {
return path.startsWith(_value.substring(0, _value.length() - 1));
}
// prefix match with /*.ext
// matches any path ending with .ext
// e.g. /images/*.png will match /images/pic.png and /images/2023/pic.png but not /img/pic.png
if (_value.startsWith("/*.")) {
return path.endsWith(_value.substring(_value.lastIndexOf(".")));
}
// finally check for prefix match with / at the end
// e.g. /images will also match /images/pic.png and /images/2023/pic.png but not /img/pic.png
if (path.startsWith(_value + "/")) {
return true;
}
// we did not match
return false;
}
};
class AsyncCaseInsensitiveURIMatcher : public AsyncCaseSensitiveURIMatcher {
public:
AsyncCaseInsensitiveURIMatcher() : AsyncCaseSensitiveURIMatcher() {}
AsyncCaseInsensitiveURIMatcher(const char *uri) : AsyncCaseSensitiveURIMatcher(uri) {
_value.toLowerCase();
}
AsyncCaseInsensitiveURIMatcher(String uri) : AsyncCaseSensitiveURIMatcher(std::move(uri)) {
_value.toLowerCase();
}
bool matches(AsyncWebServerRequest *request) const override {
String path = request->url();
path.toLowerCase();
return AsyncCaseSensitiveURIMatcher::pathMatches(path);
}
};
#ifdef ASYNCWEBSERVER_REGEX
class AsyncRegexURIMatcher : public AsyncURIMatcher {
public:
AsyncRegexURIMatcher() {}
AsyncRegexURIMatcher(std::regex pattern) : _pattern(std::move(pattern)) {}
AsyncRegexURIMatcher(const char *uri, bool ignoreCase = false) : _pattern(ignoreCase ? std::regex(uri, std::regex::icase) : std::regex(uri)) {}
AsyncRegexURIMatcher(String uri, bool ignoreCase = false) : _pattern(ignoreCase ? std::regex(uri.c_str(), std::regex::icase) : std::regex(uri.c_str())) {}
AsyncRegexURIMatcher(const AsyncRegexURIMatcher &) = default;
AsyncRegexURIMatcher(AsyncRegexURIMatcher &&) = default;
~AsyncRegexURIMatcher() = default;
AsyncRegexURIMatcher &operator=(const AsyncRegexURIMatcher &) = default;
AsyncRegexURIMatcher &operator=(AsyncRegexURIMatcher &&) = default;
bool matches(AsyncWebServerRequest *request) const {
std::smatch matches;
std::string s(request->url().c_str());
if (std::regex_search(s, matches, _pattern)) {
for (size_t i = 1; i < matches.size(); ++i) {
request->_pathParams.emplace_back(matches[i].str().c_str());
}
return true;
}
return false;
}
private:
std::regex _pattern;
};
#endif I'll wait and see what you can come up with :-) What I find nice with the polymorphism version is that we can easily provide a DSL: on(caseMatch("/foo"), [](...) {...});
on(iCaseMatch("/foo"), [](...) {...});
on(regexMatch("^/foo$"), [](...) {...});
on(starMatch("/foo*"), [](...) {...});
on(extMatch("/*.png"), [](...) {...});
on(subMatch("/dir/"), [](...) {...});
... |
Prototype here: https://github.com/willmmiles/ESPAsyncWebServer/tree/urimatcher-tiny I've included a variant of your DSL for completeness. ;) Except for basic matching, though, none of the examples really work this part of the library, so it's not thoroughly tested yet. I think the next step would be to write a good set of test cases, targeting the unmodified library first so we can validate there are no regressions. Also I am astounded that including -D ASYNCWEBSERVER_REGEX brings in a whopping 250kb of code, even if no regexes are ever created (!!!). |
c27e1d0
to
3df830d
Compare
That's clever!! String _value;
union {
intptr_t _flags;
#ifdef ASYNCWEBSERVER_REGEX
std::regex *pattern;
#endif So the cost of the solution with your approach would be only 4 bytes more per handler right ? |
Thanks -- the general concept of packing things in unused pointer bits is called "tagged pointers", and it's been getting more common these days with 64-bit architectures where there are often a lot of unused bits.
That's the goal, |
Cool! I've cherry-picked your 2 commits in this PR so that we can continue working on it together. I also added a 3rd commit to move the isRegex() function out of the public API. It was not there before and if we keep it, this is something more we'll have to maintain. I will try to make time to build an example with all that, but also a little test to verify how close we are against main. |
6b7f3ec
to
c30b292
Compare
a1ef047
to
c3d6f98
Compare
@willmmiles : PR rebased on top of #311 and fixed-up. We have some backward compatibility failures around extension matching. Looking at them right now. ![]() |
6ac1e12
to
c554794
Compare
Still a remaining bug regarding ~/.../examples/URIMatcherTest ( urimatcher → origin {4} ✓ )
❯ ./test_routes.sh
Testing URI Matcher at http://192.168.4.1:80
==================================
Testing routes that should work (200 OK):
Testing /status ... ✅ PASS (200)
Testing /exact ... ✅ PASS (200)
Testing /exact/ ... ✅ PASS (200)
Testing /exact/sub ... ✅ PASS (200)
Testing /api/users ... ✅ PASS (200)
Testing /api/data ... ✅ PASS (200)
Testing /api/v1/posts ... ✅ PASS (200)
Testing /files/document.pdf ... ✅ PASS (200)
Testing /files/images/photo.jpg ... ✅ PASS (200)
Testing /config.json ... ✅ PASS (200)
Testing /data/settings.json ... ✅ PASS (200)
Testing /style.css ... ✅ PASS (200)
Testing /assets/main.css ... ✅ PASS (200)
Testing AsyncURIMatcher factory methods:
Testing /factory/exact ... ✅ PASS (200)
Testing /factory/prefix ... ✅ PASS (200)
Testing /factory/prefix-test ... ✅ PASS (200)
Testing /factory/prefix/sub ... ✅ PASS (200)
Testing /factory/dir/users ... ✅ PASS (200)
Testing /factory/dir/sub/path ... ✅ PASS (200)
Testing /factory/files/doc.txt ... ✅ PASS (200)
Testing /factory/files/sub/readme.txt ... ✅ PASS (200)
Testing case insensitive matching:
Testing /case/exact ... ✅ PASS (200)
Testing /CASE/EXACT ... ✅ PASS (200)
Testing /Case/Exact ... ✅ PASS (200)
Testing /case/prefix ... ✅ PASS (200)
Testing /CASE/PREFIX-test ... ✅ PASS (200)
Testing /Case/Prefix/sub ... ✅ PASS (200)
Testing /case/dir/users ... ✅ PASS (200)
Testing /CASE/DIR/admin ... ✅ PASS (200)
Testing /Case/Dir/settings ... ✅ PASS (200)
Testing /case/files/doc.pdf ... ✅ PASS (200)
Testing /CASE/FILES/DOC.PDF ... ✅ PASS (200)
Testing /Case/Files/Doc.Pdf ... ✅ PASS (200)
Testing special matchers:
Testing POST /any/path (all matcher) ... ✅ PASS (200)
Checking for regex support...
Regex support detected - testing traditional regex routes:
Testing /user/123 ... ✅ PASS (200)
Testing /user/456 ... ✅ PASS (200)
Testing /blog/2023/10/15 ... ✅ PASS (200)
Testing /blog/2024/12/25 ... ✅ PASS (200)
Testing AsyncURIMatcher regex factory methods:
Testing /factory/user/123 ... ✅ PASS (200)
Testing /factory/user/789 ... ✅ PASS (200)
Testing /factory/blog/2023/10/15 ... ✅ PASS (200)
Testing /factory/blog/2024/12/31 ... ✅ PASS (200)
Testing /factory/search/hello ... ✅ PASS (200)
Testing /FACTORY/SEARCH/WORLD ... ✅ PASS (200)
Testing /Factory/Search/Test ... ✅ PASS (200)
Testing routes that should fail (404 Not Found):
Testing /nonexistent ... ✅ PASS (404)
Testing /factory/exact/sub ... ✅ PASS (404)
Testing /factory/dir ... ✅ PASS (404)
Testing /factory/files/doc.pdf ... ✅ PASS (404)
Testing /exact ... ✅ PASS (200)
Testing /EXACT ... ✅ PASS (404)
Testing /user/abc ... ✅ PASS (404)
Testing /blog/23/10/15 ... ✅ PASS (404)
Testing /factory/user/abc ... ✅ PASS (404)
==================================
Test Results:
✅ Passed: 54
❌ Failed: 0
Total: 54
🎉 All tests passed! URI matching is working correctly. Good to go! |
… with and without regex support (-D ASYNCWEBSERVER_REGEX=1)
ba11081
to
f9282e6
Compare
If I may offer one more suggestion: I'm not sure we really need the bitfield-nature of the type enum anymore. It might be simpler and clearer to convert it to a basic counting enum instead, use a switch-case for the handling. willmmiles@b873a7e What do you think? |
I will try tomorrow. During testing I saw that we cannot have any enum leading to That's why I've re-ordered the enum and set numbers starting at 1 and not 0. |
In willmmiles@b873a7e, I explicitly bit-shifted the enum type when regex mode is available to reserve the LSB for That said, there's some micro-benchmarking possible there to find out which is faster ... enum Type {
// Compiler assigns values starting at zero
Alpha,
Beta,
Gamma,
// etc.
};
#ifdef REGEX
Type type = static_cast<Type>(_matchData >> 1);
#else
Type type = static_cast<Type>(_matchData);
#endif
switch(type) {
case Alpha:
return ...;
case Beta:
return ...;
case Gamma:
return ...;
}; or #ifdef REGEX
#define ENUMVAL(i) ((i<<1) + 1) // have to shift the value so we never get an even number
#else
#define ENUMVAL(i) i
#endif
enum ExpandedType {
// Values include nonregex bit
Alpha = ENUMVAL(0),
Beta = ENUMVAL(1),
Gamma = ENUMVAL(2),
// etc.
};
switch(static_cast<ExpandedType>(_matchData)) {
case Alpha:
return ...;
case Beta:
return ...;
case Gamma:
return ...;
}; |
This PR wil replace #299
This pull request introduces two new example sketches and a comprehensive README for demonstrating and testing the advanced URI matching capabilities of the ESPAsyncWebServer library, specifically focusing on the
AsyncURIMatcher
class. The changes provide both a user-friendly demonstration (URIMatcher.ino
) and a thorough test suite (URIMatcherTest.ino
) covering all matching strategies, including factory methods, case-insensitive matching, and regex support. The new documentation inREADME.md
explains matching behavior, usage patterns, and real-world applications.New Example and Test Sketches
examples/URIMatcher/URIMatcher.ino
: A demonstration sketch showcasing all matching strategies supported byAsyncURIMatcher
, including exact, prefix, folder, extension, case-insensitive, regex, and combined flag matching. Includes a navigable HTML homepage and detailed Serial output for each route.examples/URIMatcherTest/URIMatcherTest.ino
: A test suite for validating all matching modes, including factory methods, case-insensitive matching, regex, and catch-all routes. Designed for automated testing with external scripts.Documentation and Usage Guide
examples/URIMatcher/README.md
: A comprehensive guide explaining theAsyncURIMatcher
class, auto-detection logic, matching strategies, usage patterns, available flags, performance notes, and real-world application examples.Demonstration Features
ASYNCWEBSERVER_REGEX
compilation flag. [1] [2]Testing Enhancements
References: [1] [2] [3]