Skip to content

Conversation

@sjorsdonkers
Copy link
Contributor

@sjorsdonkers sjorsdonkers commented Jul 8, 2025

https://developer.mozilla.org/en-US/docs/Web/API/Document/createNodeIterator
Does not reuse the TreeWalker implementation as there are several subtle differences

TreeWalker is likely not handling its edges cases correctly, those are not addressed in this PR except fixing, get_filter(), _previous_node() going beyond the root and functions returning *parser.Node instead of Node.Union

@sjorsdonkers
Copy link
Contributor Author

sjorsdonkers commented Jul 8, 2025

Edit: Fixed
Warnings when fetching: https://www.airbnb.com/
image

Error is raised when calling this filter function:

let e;
for (!r && document.body && (r = document.createNodeIterator(document.body, NodeFilter.SHOW_ELEMENT, (e => {
        return t = e, f.includes(t.tagName) || t.hasAttribute("data-lcp-candidate") ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
        var t
    }))); e = r?.nextNode();) h(e);
n(N)

TBD if this is related to the TreeWalker NodeIterator impl or something unrelated

@sjorsdonkers sjorsdonkers marked this pull request as draft July 8, 2025 14:09
@sjorsdonkers sjorsdonkers force-pushed the NodeIterator branch 2 times, most recently from fe1e3ef to cf5815e Compare July 9, 2025 08:32
@sjorsdonkers sjorsdonkers marked this pull request as ready for review July 9, 2025 09:29
@karlseguin
Copy link
Collaborator

karlseguin commented Jul 10, 2025

I'll look at the PR more completely in a bit, but I do think the issue is likely related to the implementation of the NodeIterator.

When I debug this, I see that the callback is called with a Node (as well as other elements), despite the code using the NodeFilter.SHOW_ELEMENT; which I assume means that it should only iterate element nodes. But, because an underlying Node is passed, the call to tagName returns undefined (which is ok), and the call to hasAttribute fails - which is where the code fails.

Without looking at the implementation, I assume it's either emitting a node when it shouldn't (because of the filter), or it's emitting an element as a node without going through the toInterface call.

const node_type = try parser.nodeType(node);

// Verify that we can show this node type.
if (!switch (node_type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Zig-ish alternative to this is @bitCast the u32 into a packed struct:

pub const WhatToShow = packed struct {
    element: bool,
    attribute: bool,
    text: bool,
    cdata: bool,
    entity_reference: bool,
    entity: bool,
    processing_instruction: bool,
    comment: bool,
    document: bool,
    document_type: bool,
    document_fragment: bool,
    notation: bool,
    _: u20,
};

pub fn verify(what_to_show: u32, filter: ?Env.Function, node: *parser.Node) !VerifyResult {
    const node_type = try parser.nodeType(node);
     // or WhatToShow.fromBitmask(what_to_show) if you want to encapsulate it
    const wts: WhatToShow = @bitCast(what_to_show);
   
   if (!switch (node_type) {
      .attribute => wts.atribute,
      , ....
     })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good to know.
In this case I'm inclined to leave it as it was. Just fewer loc.
If you do feel strongly about I would also change the u32 stored in the NodeIterator and TreeWalker, and cast it back when that property is accessed. Otherwise I'll leave it as is.

@karlseguin
Copy link
Collaborator

LGTM

Base automatically changed from EventTarget-constructor to main July 11, 2025 07:55
@sjorsdonkers sjorsdonkers merged commit 49a97db into main Jul 11, 2025
10 checks passed
@sjorsdonkers sjorsdonkers deleted the NodeIterator branch July 11, 2025 08:05
@github-actions github-actions bot locked and limited conversation to collaborators Jul 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants