Skip to content

Commit 385d3bf

Browse files
committed
Update comments for XPaths to explain disambiguating attribute predicates
1 parent 648eec6 commit 385d3bf

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

plugins/optimization-detective/class-od-html-tag-processor.php

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -101,22 +101,22 @@ final class OD_HTML_Tag_Processor extends WP_HTML_Tag_Processor {
101101
* The pattern for matching a tag /[a-zA-Z0-9:_-]+/ used here is informed by the characters found in tag names in
102102
* HTTP Archive as {@link https://docs.google.com/spreadsheets/d/1grkd2_1xSV3jvNK6ucRQ0OL1HmGTsScHuwA8GZuRLHU/edit?gid=2057119066#gid=2057119066 seen}
103103
* in Web Almanac 2022, with the only exception being the very malformed tag name `script="async"`. Note that XPaths
104-
* begin with `/HTML/BODY` followed by an index-free reference to an element which is a direct child of the BODY,
105-
* for example `/HTML/BODY/DIV`. Below this point, all tags must then have indices to disambiguate the XPaths among
106-
* siblings. For example: `/HTML/BODY/DIV/*[2][self::MAIN]/*[1][self::FIGURE]/*[2][self::IMG]`. There is little need
107-
* for there to be an index added to the `DIV` directly under the `BODY` because WordPress themes almost always use
108-
* a wrapper element: Block Themes always wrap the page in a `DIV.wp-site-blocks` element, and classic themes either
109-
* wrap the content in a `DIV#page` or else use `HEADER`, `MAIN`, and `FOOTER`.
110-
*
111-
* The benefit of omitting the node index from direct children of the BODY allows for variations in content output
104+
* begin with `/HTML/BODY` followed by an index-free reference to an element which is a direct child of the BODY but
105+
* with a disambiguating attribute predicate added, for example `/HTML/BODY/DIV[@id="page"]`. Below this point, all
106+
* tags must then have indices to disambiguate the XPaths among siblings. For example:
107+
* `/HTML/BODY/DIV[@id="page"]/*[2][self::MAIN]/*[1][self::FIGURE]/*[2][self::IMG]`.
108+
*
109+
* The benefit of omitting the node index from direct children of the BODY allows for variation in the content output
112110
* at `wp_body_open()` without impacting the computed XPaths for subsequent tags. Omitting the node index at this
113-
* level, however, does incur a slight risk of duplicate XPaths being computed. For example, if a theme has a
111+
* level, however, does introduce the risk of duplicate XPaths being computed. For example, if a theme has a
114112
* `<div id="header" role="banner">` and a `<div id="footer" role="contentinfo">` which are both direct descendants
115113
* of `BODY`, then it is possible for an XPath like `/HTML/BODY/DIV/*[1][self::IMG]` to be duplicated if both of
116-
* these `DIV` elements has an `IMG` as the first child. The conflict could arise for any tags of the same name in
117-
* the same relative position, which does not seem likely. Additionally, as noted above, themes almost always wrap
118-
* the page content in an overall `DIV` or else they use semantic HTML tags like `HEADER` and `FOOTER`, so the risk
119-
* is low.
114+
* these `DIV` elements has an `IMG` as the first child. This is also an issue in sites using the Image block
115+
* because it outputs a `DIV.wp-lightbox-overlay.zoom` in `wp_footer`, resulting in there being a real possibility
116+
* for XPaths to not be unique in the page. Therefore, en lieu of node index being added to children of `BODY`,
117+
* a disambiguating attribute predicate is added for the element's `id`, `role`, or `class` attribute. These three
118+
* attributes are the most stable across page loads, especially at the root of the document (where there is no Post
119+
* Loop using `post_class()`).
120120
*
121121
* @since 0.4.0
122122
* @see self::get_xpath()
@@ -623,8 +623,8 @@ private function is_foreign_element(): bool {
623623
*
624624
* It would be nicer if this were like `.../DIV[1]/DIV[2]` but in XPath the position() here refers to the
625625
* index of the preceding node set. So it has to rather be written `.../*[1][self::DIV]/*[2][self::DIV]`.
626-
* Note that the first three levels lack any node index, for example `/HTML/BODY/DIV` for the reasons
627-
* explained in {@see self::XPATH_PATTERN}.
626+
* Note that the first three levels lack any node index whereas the third level includes a disambiguating
627+
* attribute predicate (e.g. `/HTML/BODY/DIV[@id="page"]`) for the reasons explained in {@see self::XPATH_PATTERN}.
628628
*
629629
* @since 0.4.0
630630
*

0 commit comments

Comments
 (0)