Skip to content

Commit 21b5234

Browse files
author
Andreu Botella
authored
Make document's character encoding reflect byte order mark
This change fixes a bug where document's character encoding was set to the return value of the encoding sniffing algorithm rather than to the actual encoding used, which differed when the stream started with a byte order mark. This change incorporates BOM sniffing into the encoding sniffing algorithm, ensuring both encodings are identical. Tests: web-platform-tests/wpt#22276. Closes #1077.
1 parent 59ee34d commit 21b5234

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

source

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2497,6 +2497,9 @@ a.setAttribute('href', 'https://example.com/'); // change the content attribute
24972497

24982498
<li>The <dfn data-x-href="https://encoding.spec.whatwg.org/#utf-8-encode">UTF-8 encode</dfn>
24992499
algorithm which takes a character stream and returns a byte stream</li>
2500+
2501+
<li>The <dfn data-x-href="https://encoding.spec.whatwg.org/#bom-sniff">BOM sniff</dfn>
2502+
algorithm which takes a byte stream and returns an encoding or null.</li>
25002503
</ul>
25012504

25022505
</dd>
@@ -104983,6 +104986,16 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
104983104986

104984104987
<ol>
104985104988

104989+
<li>
104990+
<p>If the result of <span data-x="BOM sniff">BOM sniffing</span> is an encoding, return that
104991+
encoding with <span data-x="concept-encoding-confidence">confidence</span> <i>certain</i>.</p>
104992+
104993+
<p class="note">Although the <span>decode</span> algorithm will itself change the encoding to
104994+
use based on the presence of a byte order mark, this algorithm sniffs the BOM as well in order
104995+
to set the correct <span>document's character encoding</span> and <span
104996+
data-x="concept-encoding-confidence">confidence</span>.</p>
104997+
</li>
104998+
104986104999
<li>
104987105000

104988105001
<p>If the user has explicitly instructed the user agent to override the document's character

0 commit comments

Comments
 (0)