Skip to content

Conversation

@live627
Copy link
Contributor

@live627 live627 commented Nov 8, 2025

My test of over 1000 cities shows a sizeable memory reduction. Sidebar: This gist was accidentally left private in the last update, yet no one mentioned it.

Comment on lines 1103 to 1105
$prefix = mb_substr($key, 0, $len, $encoding);
$key_remainder = mb_substr($key, $len, null, $encoding);
$remaining_remainder = mb_substr($remaining, $len, null, $encoding);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some code which cuts these string based on byte boundaries, but that will make this function UTF-8 only. Would that be acceptable?

Copy link
Member

@Sesquipedalian Sesquipedalian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting into bytes rather than characters is indeed more space efficient, but sadly it's not an option for us. That's because splitting into bytes will introduce byte sequences that are not valid UTF-8 into the overall regex. Although the preg_* functions themselves can handle that, having invalid byte sequences will cause problems when trying to store the compiled regular expressions.

For example, we store the TLD regex in the database. If the regex contains invalid byte sequences, the database will reject it.

Never mind. I misread the proposed code.

@Sesquipedalian
Copy link
Member

My test of over 1000 cities shows a sizeable memory reduction. Sidebar: This gist was accidentally left private in the last update, yet no one mentioned it.

I'm pretty sure I did see the gist before, but now I get a 404 at that link. Do you have the privacy setting backwards?

@Sesquipedalian Sesquipedalian dismissed their stale review November 23, 2025 19:54

Not relevant after all due to my own misreading of the proposed changes.

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Nov 23, 2025

In my testing, these proposed changes are slower than the existing code. However, there are some good ideas in here for improving performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants