-
Notifications
You must be signed in to change notification settings - Fork 60
Catastrofic backtracking fix #426
base: main
Are you sure you want to change the base?
Conversation
… a PREG_BACKTRACK_LIMIT_ERROR and check result of preg_replace to be sure we do not set content or description to null
I also tested the updated regex again with a modified version of the feed item, see: |
I see this problem with your link. |
$this->description = preg_replace('!(<*\s*[^>]*)(src=)(.?)(\/[^\/])!','\1 src=\3'.$host.'\4', $this->description ); | ||
} | ||
if (property_exists($this, 'content') && !is_null($this->content)){ | ||
$this->content = preg_replace('!(<\s*[^>]*)(href=|src=)(.?)(\/[^\/])!','\1\2\3'.$host.'\4', $this->content) ?? $this->content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a lot of nested tags.
Change to:
$this->content = preg_replace('!(href=|src=)(.?)(\/[^\/])!','\1\2'.$host.'\3', $this->content) ?? $this->content;
Similarly for "$this->description"
This will fix error for preg_replace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll write the most correct regular expression a little later.
Wait a bit. I have a lot to do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already fixed the regex in my PR. The cycles went down from over 5.8 million to 10.000 by removing the *
after the first <
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also necessary to exclude the processing of "href" in tags <code>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, maybe the regex should only search inside <a>
and <img>
tags or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is most correct regular expression:
preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(\/[^\/])(?!(.(?!<code))*<\/code>)~','\1\2\3'.$host.'\4', $this->content)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the regex should only search inside
<a>
and<img>
tags
I'm not 100% sure about this, but I'll think about it. In addition, it is still necessary to replace relative links like href="xxx..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if analyze only <a>
and <img>
, don’t need to change them in <code>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the function to include all the other possible replacements for relative links.
Can you test my changes and make them in this PR?
All replacements can be moved to a separate function since there are more of them.
All links in https://go.dev/blog/feed.atom are replaced correctly.
protected function setHostInContent(string $host = null): void
{
if (is_null($host)) {
return;
}
// Replaced links like href="/aaa/bbb.xxx"
if (property_exists($this, 'content') && !is_null($this->content)) {
$this->content = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(\/[^\/])(?!(.(?!<code))*<\/code>)~','\1\2\3'.$host.'\4', $this->content) ?? $this->content;
}
if (property_exists($this, 'description') && !is_null($this->description)) {
$this->description = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(\/[^\/])(?!(.(?!<code))*<\/code>)~','\1\2\3'.$host.'\4', $this->description) ?? $this->description;
}
$itemFullLink = $this->getLink();
if (property_exists($this, 'link') && !is_null($itemFullLink)) {
$itemLink = implode("/", array_slice(explode("/", $itemFullLink), 0, -1))."/";
if (property_exists($this, 'content') && !is_null($this->content)){
// Replaced links like href="#aaa/bbb.xxx"
$this->content = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(#)(?!(.(?!<code))*<\/code>)~','\1\2\3'.$itemFullLink.'\4', $this->content) ?? $this->content;
// Replaced links like href="aaa/bbb.xxx"
$this->content = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(\w+\b)(?![:])(?!(.(?!<code))*<\/code>)~','\1\2\3'.$itemLink.'\4', $this->content) ?? $this->content;
}
if (property_exists($this, 'description') && !is_null($this->description)) {
// Replaced links like href="#aaa/bbb.xxx"
$this->description = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(#)(?!(.(?!<code))*<\/code>)~','\1\2\3'.$itemFullLink.'\4', $this->description) ?? $this->description;
// Replaced links like href="aaa/bbb.xxx"
$this->description = preg_replace('~(<\s*[^>]*)(href=|src=)(.?)(\w+\b)(?![:])(?!(.(?!<code))*<\/code>)~','\1\2\3'.$itemLink.'\4', $this->description) ?? $this->description;
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanest code:
protected function setHostInContent(string $host = null): void
{
if (is_null($host)) {
return;
}
// Replaced links like href="/aaa/bbb.xxx"
$pattern = '(<\s*[^>]*)(href=|src=)(.?)(\/[^\/])(?!(.(?!<code))*<\/code>)';
$this->pregReplaceInProperty('content', $pattern, '\1\2\3'.$host.'\4');
$this->pregReplaceInProperty('description', $pattern, '\1\2\3'.$host.'\4');
$itemFullLink = $this->getLink();
$itemLink = implode("/", array_slice(explode("/", $itemFullLink), 0, -1))."/";
// Replaced links like href="#aaa/bbb.xxx"
$pattern = '(<\s*[^>]*)(href=|src=)(.?)(#)(?!(.(?!<code))*<\/code>)';
$this->pregReplaceInProperty('content', $pattern, '\1\2\3'.$itemFullLink.'\4');
$this->pregReplaceInProperty('description', $pattern, '\1\2\3'.$itemFullLink.'\4');
// Replaced links like href="aaa/bbb.xxx"
$pattern = '(<\s*[^>]*)(href=|src=)(.?)(\w+\b)(?![:])(?!(.(?!<code))*<\/code>)';
$this->pregReplaceInProperty('content', $pattern, '\1\2\3'.$itemLink.'\4');
$this->pregReplaceInProperty('description', $pattern, '\1\2\3'.$itemLink.'\4');
}
public function pregReplaceInProperty(string $property, string $pattern, string $replacement): void
{
if (property_exists($this, $property) && !is_null($this->{$property})) {
$this->{$property} = preg_replace('~'.$pattern.'~', $replacement, $this->{$property}) ?? $this->{$property};
}
}
Fixes #425
Update
setHostInContent
regexes to reduce steps and hopefully prevent aPREG_BACKTRACK_LIMIT_ERROR
and check result ofpreg_replace
to be sure we do not set content or description tonull
.Also refactor
setHostInContent
andgetHostFromLink
a bit for readability.cc @IgorA100