Skip to content

Commit 68f9e52

Browse files
authored
Editorial: use leading and trailing surrogate
Also introduce an operation to obtain a scalar value from surrogates. Eventually the lead/trail byte stuff needs to be made consistent with this as well.
1 parent 2c3853e commit 68f9e52

File tree

1 file changed

+32
-28
lines changed

1 file changed

+32
-28
lines changed

encoding.bs

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,12 @@ this restore operation is an internal detail of the algorithms in this specifica
223223
be used by other standards. Implementations are free to find alternative ways to implement such
224224
algorithms, as detailed in [[#implementation-considerations]].
225225

226+
<hr>
227+
228+
<p>To obtain a <dfn>scalar value from surrogates</dfn>, given a <a for=/>leading surrogate</a>
229+
<var>leading</var> and a <a for=/>trailing surrogate</a> <var>trailing</var>, return
230+
0x10000 + ((<var>leading</var> &minus; 0xD800) &lt;&lt; 10) + (<var>trailing</var> &minus; 0xDC00).
231+
226232

227233

228234
<h2 id=encodings>Encodings</h2>
@@ -1855,8 +1861,8 @@ TextEncoderStream includes GenericTransformStream;
18551861
<dt><dfn for=TextEncoderStream>encoder</dfn>
18561862
<dd>An <a for=/>encoder</a> instance.
18571863

1858-
<dt><dfn for=TextEncoderStream>pending high surrogate</dfn>
1859-
<dd>Null or a <a for=/>surrogate</a>, initially null.
1864+
<dt><dfn for=TextEncoderStream id=textencoderstream-pending-high-surrogate>leading surrogate</dfn>
1865+
<dd>Null or a <a for=/>leading surrogate</a>, initially null.
18601866
</dl>
18611867

18621868
<p class="note no-backref">A {{TextEncoderStream}} object offers no <var>label</var> argument as it
@@ -1974,26 +1980,26 @@ constructor steps are:
19741980

19751981
<ol>
19761982
<li>
1977-
<p>If <var>encoder</var>'s <a>pending high surrogate</a> is non-null, then:
1983+
<p>If <var>encoder</var>'s <a for=TextEncoderStream>leading surrogate</a> is non-null, then:
19781984

19791985
<ol>
1980-
<li><p>Let <var>high surrogate</var> be <var>encoder</var>'s <a>pending high surrogate</a>.
1986+
<li><p>Let <var>leadingSurrogate</var> be <var>encoder</var>'s
1987+
<a for=TextEncoderStream>leading surrogate</a>.
19811988

1982-
<li><p>Set <var>encoder</var>'s <a>pending high surrogate</a> to null.
1989+
<li><p>Set <var>encoder</var>'s <a for=TextEncoderStream>leading surrogate</a> to null.
19831990

1984-
<li><p>If <var>item</var> is in the range U+DC00 to U+DFFF, inclusive, then return a scalar value
1985-
whose value is 0x10000 + ((<var>high surrogate</var> &minus; 0xD800) &lt;&lt; 10) +
1986-
(<var>item</var> &minus; 0xDC00).
1991+
<li><p>If <var>item</var> is a <a for=/>trailing surrogate</a>, then return a
1992+
<a>scalar value from surrogates</a> given <var>leadingSurrogate</var> and <var>item</var>.
19871993

19881994
<li><p><a>Restore</a> <var>item</var> to <var>input</var>.
19891995

19901996
<li><p>Return U+FFFD.
19911997
</ol>
19921998

1993-
<li><p>If <var>item</var> is in the range U+D800 to U+DBFF, inclusive, then set <a>pending high
1994-
surrogate</a> to <var>item</var> and return <a>continue</a>.
1999+
<li><p>If <var>item</var> is a <a for=/>leading surrogate</a>, then set <var>encoder</var>'s
2000+
<a for=TextEncoderStream>leading surrogate</a> to <var>item</var> and return <a>continue</a>.
19952001

1996-
<li><p>If <var>item</var> is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.
2002+
<li><p>If <var>item</var> is a <a for=/>trailing surrogate</a>, then return U+FFFD.
19972003

19982004
<li><p>Return <var>item</var>.
19992005
</ol>
@@ -2007,7 +2013,7 @@ that are split between strings. [[!INFRA]]
20072013

20082014
<ol>
20092015
<li>
2010-
<p>If <var>encoder</var>'s <a>pending high surrogate</a> is non-null, then:
2016+
<p>If <var>encoder</var>'s <a for=TextEncoderStream>leading surrogate</a> is non-null, then:
20112017

20122018
<ol>
20132019
<li>
@@ -3322,20 +3328,20 @@ in deployed content. Therefore it is not part of the <a>shared UTF-16 decoder</a
33223328
rather the <a>decode</a> algorithm.
33233329

33243330
<p><a>shared UTF-16 decoder</a> has an associated <dfn>UTF-16 lead byte</dfn> and
3325-
<dfn>UTF-16 lead surrogate</dfn> (both initially null), and
3331+
<dfn id=utf-16-lead-surrogate>UTF-16 leading surrogate</dfn> (both initially null), and
33263332
<dfn id=utf-16be-decoder-flag>is UTF-16BE decoder</dfn> (initially false).
33273333

33283334
<p><a>shared UTF-16 decoder</a>'s <a>handler</a>, given <var>ioQueue</var> and
33293335
<var>byte</var>, runs these steps:
33303336

33313337
<ol>
33323338
<li><p>If <var>byte</var> is <a>end-of-queue</a> and either
3333-
<a>UTF-16 lead byte</a> or <a>UTF-16 lead surrogate</a> is non-null, set
3334-
<a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> to null, and return
3339+
<a>UTF-16 lead byte</a> or <a>UTF-16 leading surrogate</a> is non-null, set
3340+
<a>UTF-16 lead byte</a> and <a>UTF-16 leading surrogate</a> to null, and return
33353341
<a>error</a>.
33363342

33373343
<li><p>If <var>byte</var> is <a>end-of-queue</a> and
3338-
<a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> are null, return
3344+
<a>UTF-16 lead byte</a> and <a>UTF-16 leading surrogate</a> are null, return
33393345
<a>finished</a>.
33403346

33413347
<li><p>If <a>UTF-16 lead byte</a> is null, set <a>UTF-16 lead byte</a> to
@@ -3354,13 +3360,15 @@ rather the <a>decode</a> algorithm.
33543360
<p>Then set <a>UTF-16 lead byte</a> to null.
33553361

33563362
<li>
3357-
<p>If <a>UTF-16 lead surrogate</a> is non-null, let <var>lead surrogate</var> be
3358-
<a>UTF-16 lead surrogate</a>, set <a>UTF-16 lead surrogate</a> to null, and then:
3363+
<p>If <a>UTF-16 leading surrogate</a> is non-null:
33593364

33603365
<ol>
3361-
<li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
3362-
return a code point whose value is
3363-
0x10000 + ((<var>lead surrogate</var> &minus; 0xD800) &lt;&lt; 10) + (<var>code unit</var> &minus; 0xDC00).
3366+
<li><p>Let <var>leadingSurrogate</var> be <a>UTF-16 leading surrogate</a>.
3367+
3368+
<li><p>Set <a>UTF-16 leading surrogate</a> to null.
3369+
3370+
<li><p>If <var>code unit</var> is a <a for=/>trailing surrogate</a>, then return a
3371+
<a>scalar value from surrogates</a> given <var>leadingSurrogate</var> and <var>code unit</var>.
33643372

33653373
<li><p>Let <var>byte1</var> be <var>code unit</var> >> 8.
33663374

@@ -3371,16 +3379,12 @@ rather the <a>decode</a> algorithm.
33713379
<var>byte1</var>.
33723380

33733381
<li><p><a>Restore</a> <var>bytes</var> to <var>ioQueue</var> and return <a>error</a>.
3374-
<!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->
33753382
</ol>
33763383

3377-
<li><p>If <var>code unit</var> is in the range U+D800 to U+DBFF, inclusive, set
3378-
<a>UTF-16 lead surrogate</a> to <var>code unit</var> and return
3379-
<a>continue</a>.
3384+
<li><p>If <var>code unit</var> is a <a for=/>leading surrogate</a>, then set
3385+
<a>UTF-16 leading surrogate</a> to <var>code unit</var> and return <a>continue</a>.
33803386

3381-
<li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
3382-
return <a>error</a>.
3383-
<!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->
3387+
<li><p>If <var>code unit</var> is a <a for=/>trailing surrogate</a>, then return <a>error</a>.
33843388

33853389
<li><p>Return code point <var>code unit</var>.
33863390
</ol>

0 commit comments

Comments
 (0)