-
Notifications
You must be signed in to change notification settings - Fork 265
Improve space efficiency for regex building by using a radix tree #9036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sources/Utils.php
Outdated
| $prefix = mb_substr($key, 0, $len, $encoding); | ||
| $key_remainder = mb_substr($key, $len, null, $encoding); | ||
| $remaining_remainder = mb_substr($remaining, $len, null, $encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some code which cuts these string based on byte boundaries, but that will make this function UTF-8 only. Would that be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting into bytes rather than characters is indeed more space efficient, but sadly it's not an option for us. That's because splitting into bytes will introduce byte sequences that are not valid UTF-8 into the overall regex. Although the preg_* functions themselves can handle that, having invalid byte sequences will cause problems when trying to store the compiled regular expressions.
For example, we store the TLD regex in the database. If the regex contains invalid byte sequences, the database will reject it.
Never mind. I misread the proposed code.
I'm pretty sure I did see the gist before, but now I get a 404 at that link. Do you have the privacy setting backwards? |
Not relevant after all due to my own misreading of the proposed changes.
|
In my testing, these proposed changes are slower than the existing code. However, there are some good ideas in here for improving performance. |
My test of over 1000 cities shows a sizeable memory reduction. Sidebar: This gist was accidentally left private in the last update, yet no one mentioned it.