fix: catastrophic backtracking in Core.AggressivelyFixLt#440
Merged
ezyang merged 1 commit intoezyang:masterfrom Jun 6, 2025
Merged
fix: catastrophic backtracking in Core.AggressivelyFixLt#440ezyang merged 1 commit intoezyang:masterfrom
ezyang merged 1 commit intoezyang:masterfrom
Conversation
Contributor
Author
|
Any thoughts @ezyang ? |
Contributor
Author
|
@ezyang please could you review? |
Owner
|
You caught me at an unlucky time as I went on parental leave when you submitted this PR. It's on my queue |
ezyang
approved these changes
Jun 6, 2025
github-actions bot
pushed a commit
that referenced
this pull request
Oct 17, 2025
# [4.19.0](v4.18.0...v4.19.0) (2025-10-17) ### Bug Fixes * add warning for misleading option ([#433](#433)) ([b21a591](b21a591)) * catastrophic backtracking in Core.AggressivelyFixLt ([#440](#440)) ([418eeb7](418eeb7)) * Deprecated: preg_replace(): Passing null to parameter [#3](#3) ($subject) o… ([#421](#421)) ([5d154a2](5d154a2)) * non-substantive typos ([#434](#434)) ([c2bc354](c2bc354)) ### Features * Add CSS direction support ([#429](#429)) ([63e631e](63e631e)) * Add option for safe iframe hosts using array lookup ([#423](#423)) ([b5cbf0c](b5cbf0c)) * Allow more image widths by default ([#430](#430)) ([00a0748](00a0748)) * Define option URI.AllowedSymbols ([#447](#447)) ([77ebd08](77ebd08)) * PHP 8.4 support ([#441](#441)) ([ff005f6](ff005f6)) * Support PHP 8.5 versions ([#453](#453)) ([1eb05d9](1eb05d9))
|
🎉 This PR is included in version 4.19.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When provided with a large HTML document (over a million characters) the
Core.AggressivelyFixLtregex results in catastrophic backtracking and$html = nullbeing returned. TLDR; HTMLPurifier gives you back anulldocument...I tried many times to produce a regex which did not suffer from catastrophic backtracking but I think it ultimately comes back to the argument of why you should not use regex to parse HTML. The only solutions I could come up with were to either:
pcre.backtrack_limitto a higher valueCore.AggressivelyFixLtbut that's sub-optimal given the approach seems to work on documents of a reasonable size...nullreturn value frompreg_replace_callbackand return$html(disable armor logic if a regex error occurs)The solution in this PR uses a little algorithm which employs only standard string manipulation functions so it works incredibly fast. The algorithm searches for HTML comments and allows a callback to be ran on them.
I've not messed with the signatures of the
callbackUndoCommentSubstandcallbackArmorCommentEntitiesfunctions because they'republicand might be used by other libraries.