Skip to content

Commit 883146e

Browse files
committed
HTML API: Introduce full parsing mode in HTML Processor.
The HTML Processor has only supported a specific kind of parsing mode called _the fragment parsing mode_, where it behaves in the same way that `node.innerHTML = html` does in the DOM. This mode assumes a context node and doesn't support parsing an entire document. As part of work to add more spec support to the HTML API, this patch introduces a full parsing mode, which can parse a full HTML document from start to end, including the doctype declaration and head tags. Developed in WordPress#6977 Discussed in https://core.trac.wordpress.org/ticket/61576 Props: dmsnell, jonsurrell. See #61576. git-svn-id: https://develop.svn.wordpress.org/trunk@58836 602fd350-edb4-49c9-b593-d223f7449a82
1 parent 489b840 commit 883146e

File tree

3 files changed

+587
-63
lines changed

3 files changed

+587
-63
lines changed

src/wp-includes/html-api/class-wp-html-processor-state.php

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,38 @@ class WP_HTML_Processor_State {
428428
*/
429429
public $context_node = null;
430430

431+
/**
432+
* The recognized encoding of the input byte stream.
433+
*
434+
* > The stream of code points that comprises the input to the tokenization
435+
* > stage will be initially seen by the user agent as a stream of bytes
436+
* > (typically coming over the network or from the local file system).
437+
* > The bytes encode the actual characters according to a particular character
438+
* > encoding, which the user agent uses to decode the bytes into characters.
439+
*
440+
* @since 6.7.0
441+
*
442+
* @var string|null
443+
*/
444+
public $encoding = null;
445+
446+
/**
447+
* The parser's confidence in the input encoding.
448+
*
449+
* > When the HTML parser is decoding an input byte stream, it uses a character
450+
* > encoding and a confidence. The confidence is either tentative, certain, or
451+
* > irrelevant. The encoding used, and whether the confidence in that encoding
452+
* > is tentative or certain, is used during the parsing to determine whether to
453+
* > change the encoding. If no encoding is necessary, e.g. because the parser is
454+
* > operating on a Unicode stream and doesn't have to use a character encoding
455+
* > at all, then the confidence is irrelevant.
456+
*
457+
* @since 6.7.0
458+
*
459+
* @var string
460+
*/
461+
public $encoding_confidence = 'tentative';
462+
431463
/**
432464
* HEAD element pointer.
433465
*

0 commit comments

Comments
 (0)