Skip to content

Problems with special characters and php8.4 \Dom\HTMLDocument #17785

@momala454

Description

@momala454

Description

The following code:

<?php

$text = <<<TEXT
 <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
TEXT;

$dom = \Dom\HTMLDocument::createFromString($text, options: LIBXML_NOERROR);

var_dump($dom->saveHtml());

$dom = new \DOMDocument();
$dom->loadHTML($text, LIBXML_NOERROR);
var_dump($dom->saveHtml());

Resulted in this output:

string(137) "<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est �&nbsp; test

</body></html>"
string(241) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
"

But I expected this output instead:

string(137) "<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test

</body></html>"
string(241) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
C'est à test
</body>
</html>
"

This html is taken from a real email received. Outlook is correctly displaying the à character. Firefox is also correctly displaying it.
But for some reason, php 8.4 \Dom\HTMLDocument replace à with �&nbsp;.
I am parsing received emails, so I can't really control the correctness of the initial html.

The previous DOMDocument was parsing it correctly

PHP Version

8.4.4

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions