-
Notifications
You must be signed in to change notification settings - Fork 8k
[TEMP][DRAFT] Re-bundle oniguruma #19258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Maintenance of Oniguruma has ended, so I'll temporarily bundle it
Considering that Niels already added thumbs down, it needs RFC. |
I created a Draft RFC. @alexdowad |
@youkidearitai Thanks very much for drawing attention to this issue! Many thanks to @kkos for his work on maintaining Oniguruma over the years. I'm sure it has been challenging. I think our only options are 1) fork Oniguruma and maintain it ourselves, or 2) drop the Since PHP already has a built-in extension which bundles PCRE2, dropping Oniguruma would mean that PHP only includes one regex engine, rather than two. From a maintenance perspective, that seems desirable. At the same time, it seems that Oniguruma does have some features which PCRE2 doesn't. From a quick, online search, it appears that OG:
Looking at the history of OG, it appears there were a number of CVEs in 2019, but since then there have been few. Code size: ~75,000 lines of C. About 25,000 are for the regexp engine; about 40,000 are Unicode data tables, and another 10,000 to support various text encodings. @youkidearitai Do you have any ideas about how to find how many users are relying on |
@alexdowad Thank you very much for investigation!
For example, One idea is search to GitHub: https://github.com/search?q=mb_ereg+language%253APHP&type=code&l=PHP |
Just using Claude to analyze the OG repository and figure out where the key parts are and how they work. Here's what it came up with: https://gist.github.com/alexdowad/b976a927aa908ca6f2bf76afe3d1a302 |
@youkidearitai As a suggestion, instead of re-bundling the Oniguruma source into php-src, what if we forked OG into the |
@alexdowad Indeed, Forks Oniguruma in PHP organization is one way. |
Oniguruma has strengths for example
It's not easy to get rid of Oniguruma. |
I know I don't hold any weight in this discussion, but has Onigmo been considered as an alternative to Oniguruma? It's the same engine that Ruby uses and has support for some newer RegEx things (it's an Oniguruma fork). (I'm the builder and maintainer of https://github.com/phikiphp/phiki which relies on |
@ryangjchandler I don't know why you think that your words don't hold any weight; really, we are always interested in hearing from users. Thanks for the suggestion about Onigmo. |
@ryangjchandler From what you know, would moving from Oniguruma to Onigmo be a breaking change in any way? |
@alexdowad To my knowledge it supports the same features as Oniguruma does, plus some extras to get it closer to more recent PCRE releases. The internals / ABI might be different though - that I can't confirm without looking into it. |
@ryangjchandler Thank you very much for your feedback! |
Onigumo is an interesting candidate for migration, but its transition stopped at Unicode 11.0 compared to Oniguruma. |
oniguruma also supports the much loved feature variable length lookbehind |
@RedCMD Oh that's a good one to note then as a breaking change and also a problem for my Phiki package 😅 |
I think the important part here is if there are some volunteers to do any work on this which means either migration to Onigma (and possibly trying to backport some features added from Ruby as it doesn't look like it's getting regularly updated) or maintaining the Oniguruma fork (if it's bundled or separate lib does not make huge difference IMHO as one still needs to understand the library to be able to fix the potential security issues)? The advantage of Onigma would be that it should be enough to track the Ruby version so a deeper understanding might not be necessary but as mentioned some features are missing so it might potentially break code. But it might be in some way like supporting multiple major version of any library where BC gets broken and it's up to the users (distros) to select what they want so maybe that's not such a big issue. |
@bukka I don't mind taking up the maintenance of a private Oniguruma fork, but with the condition that maintenance will include security fixes only, no new features whatsoever. |
@alexdowad Fair enough. If Alex said so, I agree and take up to maintenance Oniguruma. Because there are latent many demands. |
@alexdowad So you would basically prefer library fork in php org that would get its own releases? Just to be clear, this sort of thing might also require keeping build working and possibly also fixing some crashes (even though those might not be necessarily security issues). I would normally suggest not to use php org but considering the concerns in kkos/oniguruma#234 , it might show clearly that this is the version required for PHP. But we don't have any libraries in PHP org so it would be the first one and we would need to discuss it with admins internally. We have got actually already for of oniguruma for Windows packaging in https://github.com/winlibs/oniguruma but I guess it's probably better to keep this for Windows packaging only as none of those libs in this org is used for anything else. |
@bukka Whatever the other core devs prefer is fine. if forking a library into the |
FYI: I forked and updated Unicode 17.0 in Oniguruma https://github.com/youkidearitai/oniguruma |
Related: #18467
Maintenance of Oniguruma has ended, so I'll temporarily bundle it.
This pull request is one way of doing it, and this method may not always be adopted.