Skip to content

generate xpaths for namespaced elements #962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 13, 2025

Conversation

seanmcguire12
Copy link
Member

@seanmcguire12 seanmcguire12 commented Aug 12, 2025

why

  • some nodes are namespaced, take a look at the nodeName & localName below:
{
    "nodeId": 19,
    "parentId": 17,
    "backendNodeId": 22,
    "nodeType": 1,
    "nodeName": "FOO:BAR",
    "localName": "foo:bar",
    "nodeValue": "",
    "childNodeCount": 1,
    "children": [],
    "attributes": []
}
  • currently, when we generate xpaths for elements that are namespaced, we generate them like this:
    • /html/body/foo:bar/div/select
  • the above path won't resolve
  • to fix this, we need to use the correct spec:
    • /html/body/*[name()='foo:bar']/div/select

what changed

  • added some logic to check for : in the xpath building step. if we find this, we format the path segment as defined above (/*[name()='foo:bar'])

test plan

  • added a new eval to test this case (namespace_xpath)
  • run act evals
  • run extract evals
  • run observe evals

Copy link

changeset-bot bot commented Aug 12, 2025

🦋 Changeset detected

Latest commit: 5c3ab26

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds support for generating XPaths that properly handle namespaced XML/HTML elements. The core issue was that Stagehand's XPath generation was creating invalid paths for elements with namespaced tag names (containing colons, like foo:bar).

The change modifies the generateAccessibilityTreeXPath function in lib/a11y/utils.ts to detect when a tag name contains a colon and switches from the standard XPath syntax (foo:bar[index]) to the W3C XPath specification-compliant syntax (*[name()='foo:bar'][index]). This ensures that the generated XPaths can actually resolve in browsers, as standard XPath parsers don't handle namespaced elements without proper namespace declarations in the XPath context.

This fix is crucial for Stagehand's DOM traversal functionality when working with web applications that use XML namespaces or custom namespaced elements. The change is isolated to a single utility function that's used throughout Stagehand's accessibility tree processing, which means it will improve XPath generation across all of Stagehand's core features (act, extract, observe).

Confidence score: 4/5

  • This PR is safe to merge with minimal risk as it addresses a specific XPath generation bug
  • Score reflects a targeted fix with clear logic and proper XPath specification compliance
  • Pay close attention to lib/a11y/utils.ts to ensure the colon detection logic is robust

1 file reviewed, no comments

Edit Code Review Bot Settings | Greptile

@seanmcguire12 seanmcguire12 force-pushed the sean/stg-664-support-namespaced-xpaths branch from 61cb279 to 5c3ab26 Compare August 13, 2025 00:10
@seanmcguire12 seanmcguire12 added act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function targeted-extract These changes pertain to targeted extract labels Aug 13, 2025
@seanmcguire12 seanmcguire12 merged commit 72d2683 into main Aug 13, 2025
30 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function targeted-extract These changes pertain to targeted extract
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants