-
Notifications
You must be signed in to change notification settings - Fork 8k
Closed as not planned
Description
Description
The following code:
<?php
$text = <<<TEXT
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
TEXT;
$dom = \Dom\HTMLDocument::createFromString($text, options: LIBXML_NOERROR);
var_dump($dom->saveHtml());
$dom = new \DOMDocument();
$dom->loadHTML($text, LIBXML_NOERROR);
var_dump($dom->saveHtml());Resulted in this output:
string(137) "<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est � test
</body></html>"
string(241) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
"But I expected this output instead:
string(137) "<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body></html>"
string(241) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
"This html is taken from a real email received. Outlook is correctly displaying the à character. Firefox is also correctly displaying it.
But for some reason, php 8.4 \Dom\HTMLDocument replace à with � .
I am parsing received emails, so I can't really control the correctness of the initial html.
The previous DOMDocument was parsing it correctly
PHP Version
8.4.4
Operating System
No response