Skip to content

Conversation

@JulianVennen
Copy link
Member

@JulianVennen JulianVennen commented May 26, 2025

When the input license text has an invalid encoding a fatal error is thrown by the html to markdown converter:

PHP Fatal error:  Uncaught TypeError: preg_replace(): Argument #3 ($subject) must be of type array|string, null given in /web/vendor/league/html-to-markdown/src/Converter/TextConverter.php:24
Stack trace:
#0 /web/vendor/league/html-to-markdown/src/Converter/TextConverter.php(24): preg_replace()
#1 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(222): League\HTMLToMarkdown\Converter\TextConverter->convert()
#2 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(193): League\HTMLToMarkdown\HtmlConverter->convertToMarkdown()
#3 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(188): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#4 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(188): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#5 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(188): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#6 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(188): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#7 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(188): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#8 /web/vendor/league/html-to-markdown/src/HtmlConverter.php(99): League\HTMLToMarkdown\HtmlConverter->convertChildren()
#9 /web/vendor/aternos/licensee/src/TextTransformer/HtmlTransformer.php(26): League\HTMLToMarkdown\HtmlConverter->convert()
#10 /web/vendor/aternos/licensee/src/License/Text/LicenseText.php(106): Aternos\Licensee\TextTransformer\HtmlTransformer->transform()
#11 /web/vendor/aternos/licensee/src/License/Text/LicenseText.php(131): Aternos\Licensee\License\Text\LicenseText->getNormalizedContent()
#12 /web/vendor/aternos/licensee/src/Matcher/ExactMatcher.php(15): Aternos\Licensee\License\Text\LicenseText->getWordSet()
#13 /web/vendor/aternos/licensee/src/Matcher/Matcher.php(45): Aternos\Licensee\Matcher\ExactMatcher->match()
#14 /web/vendor/aternos/licensee/src/Matcher/Matcher.php(62): Aternos\Licensee\Matcher\Matcher->getAllMatches()
#15 /web/vendor/aternos/licensee/src/Licensee.php(80): Aternos\Licensee\Matcher\Matcher->getMatch()
#16 /web/src/Processor/MetaDataExtractor/MetaFile/LicenseFile.php(34): Aternos\Licensee\Licensee->findLicenseByContent()

This PR fixes that by converting the text to UTF-8 using mb_convert_encoding before applying the transformers. The mbstring extension has been added as a required dependency.

@JulianVennen JulianVennen requested a review from matthi4s May 26, 2025 14:06
@JulianVennen JulianVennen changed the title Invalid encoding Fix fatal error that is thrown when the input string has an invalid encoding May 26, 2025
@JulianVennen JulianVennen merged commit 183fcd7 into master May 26, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants