Conversation
WalkthroughAdds HTML and XML processing utilities: an HTML DOM adapter, XPath optimalizer, HTML minifier and locator map builder, XML XPath validator, and a raw-text plugin. Also updates go.mod with an indirect htmlquery dependency and adds an option to disable the reranker with corresponding conditional logic in modes and Locatr initialization. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Plugin as rawTextPlugin
participant HTML as pkg/internal/html
participant XML as pkg/internal/xml
participant Types as types
rect rgba(230,245,255,0.6)
Note over Client,Plugin: GetMinifiedDOM (HTML or XML)
Client->>Plugin: GetMinifiedDOM(ctx)
alt pageType == HTML
Plugin->>HTML: MinifySource(content)
HTML-->>Plugin: *types.ElementSpec
Plugin->>HTML: CreateLocatorMap(content)
HTML-->>Plugin: map[id][]xpath
Plugin->>Types: Build types.DOM
else pageType == XML
Plugin->>XML: MinifySource(content)
XML-->>Plugin: *types.ElementSpec
Plugin->>XML: CreateLocatorMap(content)
XML-->>Plugin: map[id][]xpath
Plugin->>Types: Build types.DOM
end
Plugin-->>Client: *types.DOM
end
sequenceDiagram
autonumber
actor Client
participant Plugin as rawTextPlugin
participant HTML as pkg/internal/html
participant XML as pkg/internal/xml
rect rgba(240,255,240,0.6)
Note over Client,Plugin: IsLocatorValid
Client->>Plugin: IsLocatorValid(ctx, locator)
alt HTML page
Plugin->>HTML: IsValidXPath(locator, content)
HTML-->>Plugin: bool, err
else XML page
Plugin->>XML: IsValidXPath(locator, content)
XML-->>Plugin: bool, err
end
Plugin-->>Client: bool, err
end
sequenceDiagram
autonumber
participant Doc as HTMLDoc
participant Node as HTMLNode
participant X as GetOptimalXPath
rect rgba(255,248,230,0.6)
Note over Doc,X: High-level XPath selection flow
Node->>X: GetOptimalXPath(doc, node)
X->>Doc: doc.Find(xpathCandidate)
alt Unique match found
X-->>Node: return attribute-based XPath
else No unique candidate
X->>Node: GetParent()/Index()
X-->>Node: return hierarchical /.../tag[n]
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/internal/xml/adapter.go (1)
55-57: Nil-safety: prevent panics when parent is nil and when receiver wraps a nil node.Current value receivers and unconditional parent wrapping can panic (e.g., recursive calls that walk to a nil parent). Make receivers pointer-based with nil-guards and return nil when no parent.
Apply this diff:
-func (n XMLNode) TagName() string { - return n.node.Data -} +func (n *XMLNode) TagName() string { + if n == nil || n.node == nil { + return "" + } + return n.node.Data +} -func (n XMLNode) IsElement() bool { - return n.node.Type == xmlquery.ElementNode -} +func (n *XMLNode) IsElement() bool { + if n == nil || n.node == nil { + return false + } + return n.node.Type == xmlquery.ElementNode +} -func (n *XMLNode) GetParent() Node { - return NewXMLNode(n.node.Parent) -} +func (n *XMLNode) GetParent() Node { + if n == nil || n.node == nil || n.node.Parent == nil { + return nil + } + return NewXMLNode(n.node.Parent) +} -func (n *XMLNode) ChildNodes() []Node { +func (n *XMLNode) ChildNodes() []Node { + if n == nil || n.node == nil { + return nil + } var nodes []Node @@ -} +} -func (n *XMLNode) Equal(n1 Node) bool { +func (n *XMLNode) Equal(n1 Node) bool { + if n == nil { + _, ok := n1.(*XMLNode) + return !ok // nil receiver equal only to nil *XMLNode in interface + } xn1, ok := n1.(*XMLNode) if !ok { return false } return n.node == xn1.node } -func (n *XMLNode) Index() int { - if n.node.Parent == nil { +func (n *XMLNode) Index() int { + if n == nil || n.node == nil || n.node.Parent == nil { return 1 }Also applies to: 59-61, 76-78, 80-88, 90-97, 99-116
🧹 Nitpick comments (10)
pkg/internal/xml/adapter.go (1)
118-129: Reasonable helper; consider parsed‑doc overload to avoid repeated parsing in loops.
IsValidXPathreparses the DOM string on each call. If used in a loop, an overload that accepts a parsed*xmlquery.Node(or yourDocument) would reduce CPU/allocs.pkg/internal/html/adapter.go (1)
119-130:IsValidXPathis fine; align with XML helper and consider parsed‑doc overload.Parity looks good. Add an overload that accepts
*html.Node/Documentto avoid repeated parsing hot paths.pkg/internal/html/xpath.go (1)
35-39: Typos in comments.Use “hierarchical” and “prepend”.
-// than hierarchial query. +// than hierarchical query. @@ -// otherwise fall back to a purely hierarchial expression +// otherwise fall back to a purely hierarchical expression @@ -// Make a recursive call to this nodes parents and preprend it to this xpath +// Make a recursive call to this node's parent and prepend it to this xpathAlso applies to: 67-71, 90-93
pkg/internal/html/minifier.go (2)
83-90: Inline style visibility: match common “opacity:0” and collapsed visibility.Include
opacity:0(no space) andvisibility:collapseto catch more hidden cases.- if strings.Contains(s, "display:none") || - strings.Contains(s, "visibility:hidden") || - strings.Contains(s, "opacity: 0") { + if strings.Contains(s, "display:none") || + strings.Contains(s, "visibility:hidden") || + strings.Contains(s, "visibility:collapse") || + strings.Contains(s, "opacity:0") || + strings.Contains(s, "opacity: 0") { return false }
143-156: Typo: PrintLocatrs → PrintLocators (debug helper).Minor naming nit; helps grepping when needed.
-func PrintLocatrs(locatrs []string) { +func PrintLocators(locatrs []string) {pkg/plugins/raw-text.go (5)
13-15: Typo in error message + add explicit error kinds used by guards.Apply this diff:
var ( - ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evalutaion mode") + ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evaluation mode") + ErrInvalidPageType = errors.New("invalid page type") + ErrInvalidPlatform = errors.New("invalid platform") + ErrEmptyContent = errors.New("empty page source") )
66-86: Wrap errors with context and fixlocatrMaptypo.Apply this diff:
eSpec, err := html.MinifySource(pageSource) if err != nil { - return nil, err + return nil, fmt.Errorf("html minify: %w", err) } - locatrMap, err := html.CreateLocatorMap(pageSource) + locatorMap, err := html.CreateLocatorMap(pageSource) if err != nil { - return nil, err + return nil, fmt.Errorf("html locator map: %w", err) } dom := &types.DOM{ RootElement: eSpec, Metadata: &types.DOMMetadata{ LocatorType: types.XPathType, - LocatorMap: locatrMap, + LocatorMap: locatorMap, }, } return dom, nil
124-131: Validate empty locators and handle unknown PageType explicitly.Apply this diff:
func (plugin *rawTextPlugin) IsLocatorValid(ctx context.Context, locator string) (bool, error) { - pageSource := plugin.content - - if plugin.pageType == HTMLPageType { - return html.IsValidXPath(locator, pageSource) - } - return xml.IsValidXPath(locator, pageSource) + if strings.TrimSpace(locator) == "" { + return false, errors.New("empty locator") + } + pageSource := plugin.content + switch plugin.pageType { + case HTMLPageType: + return html.IsValidXPath(locator, pageSource) + case XMLPageType: + return xml.IsValidXPath(locator, pageSource) + default: + return false, ErrInvalidPageType + } }
39-50: Optional: implementString()for Platform and makeStr()a delegator.Idiomatic Go favors
Stringer. KeepStr()for compatibility and delegate toString().Add:
func (p Platform) String() string { switch p { case AndroidPlatform: return androidText case IosPlatform: return iosText case WebPlatform: return webText default: return invalidText } }Then:
-func (platform Platform) Str() string { - switch platform { - case AndroidPlatform: - return androidText - case IosPlatform: - return iosText - case WebPlatform: - return webText - default: - return invalidText - } -} +func (platform Platform) Str() string { return platform.String() }
117-122: API naming clarity: method name implies HTML-only but returns XML IDs whenPageTypeis XML.Consider renaming to
ExtractFirstUniqueIDand keeping this as a shim for compatibility, or at least add a doc comment clarifying the behavior.Would changing the interface to a neutral method name be acceptable across callers?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (6)
go.mod(1 hunks)pkg/internal/html/adapter.go(1 hunks)pkg/internal/html/minifier.go(1 hunks)pkg/internal/html/xpath.go(1 hunks)pkg/internal/xml/adapter.go(2 hunks)pkg/plugins/raw-text.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
pkg/internal/html/xpath.go (1)
pkg/internal/html/adapter.go (2)
Document(10-12)Node(14-23)
pkg/internal/html/minifier.go (4)
pkg/internal/html/adapter.go (3)
Node(14-23)NewHTMLDoc(29-31)NewHTMLNode(52-54)pkg/types/dom.go (1)
ElementSpec(14-20)pkg/internal/html/xpath.go (1)
GetOptimalXPath(18-93)pkg/internal/utils/utils.go (1)
GenerateUniqueId(324-327)
pkg/plugins/raw-text.go (2)
pkg/types/dom.go (3)
DOM(49-52)DOMMetadata(43-46)XPathType(10-10)pkg/internal/utils/utils.go (2)
ExtractFirstUniqueHTMLID(187-245)ExtractFirstUniqueXMLID(152-173)
| ) | ||
|
|
||
| require ( | ||
| github.com/antchfx/htmlquery v1.3.4 // indirect |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Make htmlquery a direct dependency (not indirect).
github.com/antchfx/htmlquery is imported directly in the codebase, so it should not be marked // indirect. This can cause drift and unexpected tidy rewrites.
Apply this diff:
- github.com/antchfx/htmlquery v1.3.4 // indirect
+ github.com/antchfx/htmlquery v1.3.4Run to confirm direct imports and tidy the module:
🏁 Script executed:
#!/bin/bash
# Verify direct imports exist
rg -nP --type=go '^\s*import\s*\(\s*(?s).*?github\.com/antchfx/htmlquery' -C2 || rg -nP --type=go '^\s*import\s*"github\.com/antchfx/htmlquery"'
# Show current go.mod line
rg -n '^(\s*)github\.com/antchfx/htmlquery\b' go.mod -n
# Suggest tidying (to be run locally)
echo "Run: go mod tidy"Length of output: 68
🏁 Script executed:
#!/bin/bash
# Search for direct imports of htmlquery package
rg -n "github.com/antchfx/htmlquery" --type=go
# Also search for any usage of htmlquery functions/types
rg -n "htmlquery\." --type=go -A1 -B1
# Check if htmlquery is used in any import statements (both single line and multi-line)
rg -nP --type=go 'import.*"github\.com/antchfx/htmlquery"' || rg -nP --type=go -U '(?s)import\s*\([^)]*github\.com/antchfx/htmlquery[^)]*\)'Length of output: 1776
Make github.com/antchfx/htmlquery a direct dependency (remove // indirect from go.mod)
github.com/antchfx/htmlquery is imported directly in pkg/internal/html/adapter.go and pkg/internal/html/minifier.go — remove the // indirect marker in go.mod and run go mod tidy. (go.mod: line ~23)
- github.com/antchfx/htmlquery v1.3.4 // indirect
+ github.com/antchfx/htmlquery v1.3.4📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| github.com/antchfx/htmlquery v1.3.4 // indirect | |
| github.com/antchfx/htmlquery v1.3.4 |
🤖 Prompt for AI Agents
In go.mod around line 23, the dependency github.com/antchfx/htmlquery is marked
as indirect though it is imported directly in pkg/internal/html/adapter.go and
pkg/internal/html/minifier.go; remove the "// indirect" marker from that require
line (making it a direct dependency) and then run `go mod tidy` to update module
metadata and ensure go.sum is correct.
| func (n HTMLNode) TagName() string { | ||
| return n.node.Data | ||
| } |
There was a problem hiding this comment.
Nil-safety: fix potential panics on root recursion and nil receivers.
Same issues as XML adapter. Use pointer receivers with guards and return nil parent explicitly to keep GetOptimalXPath recursion safe.
Apply this diff:
-func (n HTMLNode) TagName() string {
- return n.node.Data
-}
+func (n *HTMLNode) TagName() string {
+ if n == nil || n.node == nil {
+ return ""
+ }
+ return n.node.Data
+}
-func (n HTMLNode) IsElement() bool {
- return n.node.Type == html.ElementNode
-}
+func (n *HTMLNode) IsElement() bool {
+ if n == nil || n.node == nil {
+ return false
+ }
+ return n.node.Type == html.ElementNode
+}
-func (n *HTMLNode) GetParent() Node {
- return NewHTMLNode(n.node.Parent)
-}
+func (n *HTMLNode) GetParent() Node {
+ if n == nil || n.node == nil || n.node.Parent == nil {
+ return nil
+ }
+ return NewHTMLNode(n.node.Parent)
+}
-func (n *HTMLNode) ChildNodes() []Node {
+func (n *HTMLNode) ChildNodes() []Node {
+ if n == nil || n.node == nil {
+ return nil
+ }
var nodes []Node
@@
-}
+}
-func (n *HTMLNode) Equal(n1 Node) bool {
+func (n *HTMLNode) Equal(n1 Node) bool {
+ if n == nil {
+ _, ok := n1.(*HTMLNode)
+ return !ok
+ }
xn1, ok := n1.(*HTMLNode)
if !ok {
return false
}
return n.node == xn1.node
}
-func (n *HTMLNode) Index() int {
- if n.node.Parent == nil {
+func (n *HTMLNode) Index() int {
+ if n == nil || n.node == nil || n.node.Parent == nil {
return 1
}Also applies to: 60-62, 77-79, 81-89, 91-98, 100-117
🤖 Prompt for AI Agents
In pkg/internal/html/adapter.go around lines 56-58 (and similarly for 60-62,
77-79, 81-89, 91-98, 100-117), change the HTMLNode methods to use pointer
receivers and add nil-safety guards: if the receiver is nil or the underlying
n.node is nil return safe zero values (for TagName and similar string-returning
methods return empty string; for Parent/ParentNode-returning methods return nil
explicitly) so recursive GetOptimalXPath calls don't panic; ensure every method
checks receiver and n.node before accessing fields and adjust signatures to func
(n *HTMLNode) ... accordingly.
| func getVisibleText(element *html.Node) string { | ||
| txt := element.Data | ||
| return escapeString(strings.TrimSpace(txt)) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Incorrect element “text” extraction and inconsistent document root for XPath generation.
getVisibleTextreturns the tag name, not the element’s textual content.MinifySourcebuilds XPaths with the first element as theDocumentroot, whileCreateLocatorMapuses the parsed root. This inconsistency can yield different XPaths/IDs for the same element.
Apply this diff:
-func getVisibleText(element *html.Node) string {
- txt := element.Data
- return escapeString(strings.TrimSpace(txt))
-}
+func getVisibleText(element *html.Node) string {
+ if element == nil {
+ return ""
+ }
+ var b strings.Builder
+ var walk func(*html.Node)
+ walk = func(n *html.Node) {
+ if n.Type == html.TextNode {
+ t := strings.TrimSpace(n.Data)
+ if t != "" {
+ if b.Len() > 0 {
+ b.WriteByte(' ')
+ }
+ b.WriteString(t)
+ }
+ return
+ }
+ for c := n.FirstChild; c != nil; c = c.NextSibling {
+ walk(c)
+ }
+ }
+ walk(element)
+ return escapeString(b.String())
+}
@@
- spec, err := createElementSpec(node, node)
+ spec, err := createElementSpec(node, root)Also applies to: 169-183, 193-199
| isNodeName := len(attrs) == 0 | ||
| if isNodeName { | ||
| xpath := fmt.Sprintf("//%s", domNode.TagName()) | ||
|
|
||
| isUnique, _ := determineXpathUniqueness(xpath, doc, domNode) | ||
| if isUnique { | ||
| if !domNode.HasParent() { | ||
| xpath = fmt.Sprintf("/%s", domNode.TagName()) | ||
| } | ||
| return true, true, xpath | ||
| } | ||
| return false, false, "" | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix uniqueness logic and avoid invalid XPaths; also escape attribute values.
determineXpathUniquenessreturns true for 0 or 1 mismatched results, which is incorrect.- Building a semi-unique XPath with idx 0 yields invalid
(...)[0]. - Attribute values are embedded unescaped; quotes inside values break XPath literals.
- Avoid
panicon malformed pairs; keep this resilient.
Apply this diff:
@@
- if isNodeName {
- xpath := fmt.Sprintf("//%s", domNode.TagName())
+ if isNodeName {
+ xpath := fmt.Sprintf("//%s", domNode.TagName())
@@
- isUnique, _ := determineXpathUniqueness(xpath, doc, domNode)
+ isUnique, _ := determineXpathUniqueness(xpath, doc, domNode)
if isUnique {
if !domNode.HasParent() {
xpath = fmt.Sprintf("/%s", domNode.TagName())
}
return true, true, xpath
}
return false, false, ""
}
@@
- isPair := len(strings.Fields(attrs[0])) > 1
+ isPair := len(strings.Fields(attrs[0])) > 1
@@
- if isPair {
- attr1, attr2, ok := strings.Cut(attr, " ")
- if !ok {
- panic("generateUniqueXPATH invalid state")
- }
+ if isPair {
+ attr1, attr2, ok := strings.Cut(attr, " ")
+ if !ok {
+ continue
+ }
@@
- xpath = fmt.Sprintf(
- "//%s[@%s=\"%s\" and @%s=\"%s\"]",
- tagForXpath,
- attr1, attr1Value,
- attr2, attr2Value,
- )
+ xpath = fmt.Sprintf("//%s[@%s=%s and @%s=%s]",
+ tagForXpath,
+ attr1, quoteXPathLiteral(attr1Value),
+ attr2, quoteXPathLiteral(attr2Value),
+ )
} else {
attrValue := domNode.GetAttribute(attr)
if attrValue == "" {
continue
}
- xpath = fmt.Sprintf(
- "//%s[@%s=\"%s\"]",
- tagForXpath,
- attr, attrValue,
- )
+ xpath = fmt.Sprintf("//%s[@%s=%s]",
+ tagForXpath,
+ attr, quoteXPathLiteral(attrValue),
+ )
}
@@
- isUnique, idx := determineXpathUniqueness(xpath, doc, domNode)
+ isUnique, idx := determineXpathUniqueness(xpath, doc, domNode)
if isUnique {
uniqueXPath = xpath
break
}
- if semiUniqueXPath == "" {
- semiUniqueXPath = fmt.Sprintf("(%s)[%d]", xpath, idx)
+ if semiUniqueXPath == "" && idx > 0 {
+ semiUniqueXPath = fmt.Sprintf("(%s)[%d]", xpath, idx)
}
}
@@
-func determineXpathUniqueness(xpath string, doc Document, domNode Node) (bool, int) {
- elems := doc.Find(xpath)
- if len(elems) > 1 {
- idx := slices.IndexFunc(elems, func(node Node) bool {
- return domNode.Equal(node)
- })
- return false, idx + 1
- }
- return true, 0
-}
+func determineXpathUniqueness(xpath string, doc Document, domNode Node) (bool, int) {
+ elems := doc.Find(xpath)
+ switch len(elems) {
+ case 0:
+ return false, 0
+ case 1:
+ if domNode.Equal(elems[0]) {
+ return true, 1
+ }
+ return false, 0
+ default:
+ idx := slices.IndexFunc(elems, func(node Node) bool {
+ return domNode.Equal(node)
+ })
+ if idx >= 0 {
+ return false, idx + 1
+ }
+ return false, 0
+ }
+}
+
+// quoteXPathLiteral quotes s as an XPath literal, handling embedded quotes via concat() when needed.
+func quoteXPathLiteral(s string) string {
+ switch {
+ case !strings.ContainsAny(s, `'"`):
+ return "'" + s + "'"
+ case !strings.Contains(s, `'`):
+ return "'" + s + "'"
+ case !strings.Contains(s, `"`):
+ return `"` + s + `"`
+ default:
+ parts := strings.Split(s, `"`)
+ var b strings.Builder
+ b.WriteString("concat(")
+ for i, p := range parts {
+ if i > 0 {
+ b.WriteString(`, '"', `)
+ }
+ b.WriteString("'" + p + "'")
+ }
+ b.WriteString(")")
+ return b.String()
+ }
+}Also applies to: 138-176, 187-196
| import ( | ||
| "context" | ||
| "errors" | ||
|
|
||
| "github.com/vertexcover-io/locatr/pkg/internal/html" | ||
| "github.com/vertexcover-io/locatr/pkg/internal/utils" | ||
| "github.com/vertexcover-io/locatr/pkg/internal/xml" | ||
| "github.com/vertexcover-io/locatr/pkg/types" | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add missing imports for error wrapping and content checks.
Apply this diff:
import (
"context"
"errors"
+ "fmt"
+ "strings"
"github.com/vertexcover-io/locatr/pkg/internal/html"
"github.com/vertexcover-io/locatr/pkg/internal/utils"
"github.com/vertexcover-io/locatr/pkg/internal/xml"
"github.com/vertexcover-io/locatr/pkg/types"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import ( | |
| "context" | |
| "errors" | |
| "github.com/vertexcover-io/locatr/pkg/internal/html" | |
| "github.com/vertexcover-io/locatr/pkg/internal/utils" | |
| "github.com/vertexcover-io/locatr/pkg/internal/xml" | |
| "github.com/vertexcover-io/locatr/pkg/types" | |
| ) | |
| import ( | |
| "context" | |
| "errors" | |
| "fmt" | |
| "strings" | |
| "github.com/vertexcover-io/locatr/pkg/internal/html" | |
| "github.com/vertexcover-io/locatr/pkg/internal/utils" | |
| "github.com/vertexcover-io/locatr/pkg/internal/xml" | |
| "github.com/vertexcover-io/locatr/pkg/types" | |
| ) |
🤖 Prompt for AI Agents
In pkg/plugins/raw-text.go around lines 3 to 11, the file is missing imports
needed for wrapping errors and performing content checks; add the standard "fmt"
package to wrap/annotate errors with fmt.Errorf("%w", err) and add the "strings"
package to run content checks (e.g., TrimSpace/HasPrefix/Contains) where the
code validates or examines text content, then update the error construction
sites to use fmt.Errorf for wrapping and replace any ad-hoc content checks with
the appropriate strings helpers.
| func (plugin *rawTextPlugin) minifyXML() (*types.DOM, error) { | ||
| pageSource := plugin.content | ||
| platform := plugin.platform.Str() | ||
|
|
||
| eSpec, err := xml.MinifySource(pageSource, platform) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| locatrMap, err := xml.CreateLocatorMap(pageSource, platform) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| dom := &types.DOM{ | ||
| RootElement: eSpec, | ||
| Metadata: &types.DOMMetadata{ | ||
| LocatorType: types.XPathType, | ||
| LocatorMap: locatrMap, | ||
| }, | ||
| } | ||
| return dom, nil | ||
| } |
There was a problem hiding this comment.
XML path should reject unknown platforms; also wrap underlying errors and fix a variable name nit.
Apply this diff:
func (plugin *rawTextPlugin) minifyXML() (*types.DOM, error) {
pageSource := plugin.content
platform := plugin.platform.Str()
+ if platform == invalidText {
+ return nil, ErrInvalidPlatform
+ }
+
eSpec, err := xml.MinifySource(pageSource, platform)
if err != nil {
- return nil, err
+ return nil, fmt.Errorf("xml minify: %w", err)
}
- locatrMap, err := xml.CreateLocatorMap(pageSource, platform)
+ locatorMap, err := xml.CreateLocatorMap(pageSource, platform)
if err != nil {
- return nil, err
+ return nil, fmt.Errorf("xml locator map: %w", err)
}
dom := &types.DOM{
RootElement: eSpec,
Metadata: &types.DOMMetadata{
LocatorType: types.XPathType,
- LocatorMap: locatrMap,
+ LocatorMap: locatorMap,
},
}
return dom, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (plugin *rawTextPlugin) minifyXML() (*types.DOM, error) { | |
| pageSource := plugin.content | |
| platform := plugin.platform.Str() | |
| eSpec, err := xml.MinifySource(pageSource, platform) | |
| if err != nil { | |
| return nil, err | |
| } | |
| locatrMap, err := xml.CreateLocatorMap(pageSource, platform) | |
| if err != nil { | |
| return nil, err | |
| } | |
| dom := &types.DOM{ | |
| RootElement: eSpec, | |
| Metadata: &types.DOMMetadata{ | |
| LocatorType: types.XPathType, | |
| LocatorMap: locatrMap, | |
| }, | |
| } | |
| return dom, nil | |
| } | |
| func (plugin *rawTextPlugin) minifyXML() (*types.DOM, error) { | |
| pageSource := plugin.content | |
| platform := plugin.platform.Str() | |
| if platform == invalidText { | |
| return nil, ErrInvalidPlatform | |
| } | |
| eSpec, err := xml.MinifySource(pageSource, platform) | |
| if err != nil { | |
| return nil, fmt.Errorf("xml minify: %w", err) | |
| } | |
| locatorMap, err := xml.CreateLocatorMap(pageSource, platform) | |
| if err != nil { | |
| return nil, fmt.Errorf("xml locator map: %w", err) | |
| } | |
| dom := &types.DOM{ | |
| RootElement: eSpec, | |
| Metadata: &types.DOMMetadata{ | |
| LocatorType: types.XPathType, | |
| LocatorMap: locatorMap, | |
| }, | |
| } | |
| return dom, nil | |
| } |
🤖 Prompt for AI Agents
In pkg/plugins/raw-text.go around lines 88 to 108, the code should reject
unknown platforms, wrap underlying errors, and fix a variable-name nit: validate
plugin.platform.Str() against accepted platform strings (return an error like
fmt.Errorf("xml: unknown platform %q", platform) on default), rename locatrMap
to locatorMap, and when returning errors from xml.MinifySource and
xml.CreateLocatorMap wrap them with context using fmt.Errorf("minify xml: %w",
err) and fmt.Errorf("create locator map: %w", err) respectively so callers get
wrapped errors with clear context.
| func (plugin *rawTextPlugin) GetMinifiedDOM(ctx context.Context) (*types.DOM, error) { | ||
| if plugin.pageType == HTMLPageType { | ||
| return plugin.minifyHTML() | ||
| } | ||
| return plugin.minifyXML() | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Blocker: invalid PageType currently falls through to XML; also missing empty-content guard.
Ensure we don’t silently run XML logic for unknown values and fail fast on empty page source.
Apply this diff:
func (plugin *rawTextPlugin) GetMinifiedDOM(ctx context.Context) (*types.DOM, error) {
- if plugin.pageType == HTMLPageType {
- return plugin.minifyHTML()
- }
- return plugin.minifyXML()
+ if strings.TrimSpace(plugin.content) == "" {
+ return nil, ErrEmptyContent
+ }
+ switch plugin.pageType {
+ case HTMLPageType:
+ return plugin.minifyHTML()
+ case XMLPageType:
+ return plugin.minifyXML()
+ default:
+ return nil, ErrInvalidPageType
+ }
}Run to surface all call sites and double‑check correct PageType/Platform combos:
🏁 Script executed:
#!/bin/bash
rg -nP -C2 --type=go 'NewRawTextPlugin\('Length of output: 275
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== cat pkg/plugins/raw-text.go ==="
nl -ba pkg/plugins/raw-text.go | sed -n '1,240p' || true
echo
echo "=== search repo for symbols ==="
rg -n --hidden --no-ignore-vcs 'NewRawTextPlugin\(' || true
rg -n --hidden --no-ignore-vcs '\brawTextPlugin\b' || true
rg -n --hidden --no-ignore-vcs '\bGetMinifiedDOM\(' || true
rg -n --hidden --no-ignore-vcs 'minifyHTML\(|minifyXML\(' || true
rg -n --hidden --no-ignore-vcs '\btype\s+PageType\b|HTMLPageType|XMLPageType' -S || true
rg -n --hidden --no-ignore-vcs 'ErrEmptyContent|ErrInvalidPageType' -S || true
rg -n --hidden --no-ignore-vcs '\bcontent\b' -S pkg/plugins || trueLength of output: 4788
Blocker — fail fast on empty content & make PageType handling explicit in raw-text plugin (pkg/plugins/raw-text.go)
- Replace fall‑through XML path in GetMinifiedDOM (pkg/plugins/raw-text.go:110–115) with an empty‑content guard (return ErrEmptyContent when strings.TrimSpace(plugin.content) == "") and a switch on plugin.pageType with cases HTMLPageType and XMLPageType; default must return ErrInvalidPageType.
- Define ErrEmptyContent and ErrInvalidPageType (or reuse existing error values) and add the strings import if missing.
- Apply the same explicit PageType validation to other methods that branch on pageType (ExtractFirstUniqueHTMLID, IsLocatorValid, GetCurrentContext — ~lines 117–137) to avoid silent XML fall‑through.
- Confirm callers construct rawTextPlugin with valid PageType/Platform values (rg shows NewRawTextPlugin is only defined here; verify external usages if any).
🤖 Prompt for AI Agents
In pkg/plugins/raw-text.go around lines 110–115 (and extending through
~117–137), replace the current fall-through XML path in GetMinifiedDOM with an
early empty-content guard that returns ErrEmptyContent when
strings.TrimSpace(plugin.content) == "" and then use a switch on plugin.pageType
with explicit cases for HTMLPageType -> plugin.minifyHTML() and XMLPageType ->
plugin.minifyXML(), returning ErrInvalidPageType in the default; define
ErrEmptyContent and ErrInvalidPageType (or reuse existing error values) and add
the strings import if missing. Apply the same explicit pageType validation
(switch with HTMLPageType and XMLPageType and default ErrInvalidPageType) to
ExtractFirstUniqueHTMLID, IsLocatorValid, and GetCurrentContext to avoid silent
XML fall-through. Finally, verify callers construct rawTextPlugin with valid
PageType/Platform values (check NewRawTextPlugin and any external usages) and
update tests or callsites if they rely on the old implicit behavior.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/mode/visual-analysis.go (1)
83-85: Fail fast before expensive DOM workThanks for adding the guard. Could we move it right after
m.applyDefaults()so we short‑circuit before callingGetMinifiedDOMand splitting chunks? That avoids needless DOM/minifier work (and any plugin side effects) when the reranker is missing.@@ m.applyDefaults() + if rerankerClient == nil { + return fmt.Errorf("reranker client is required for visual analysis mode") + } + dom, err := plugin.GetMinifiedDOM(ctx) if err != nil { return err @@ - if rerankerClient == nil { - return fmt.Errorf("reranker client is required for visual analysis mode") - }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/locatr.go(4 hunks)pkg/mode/dom-analysis.go(1 hunks)pkg/mode/visual-analysis.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/locatr.go (4)
pkg/types/llm.go (1)
LLMClientInterface(9-19)pkg/types/reranker.go (1)
RerankerClientInterface(22-24)pkg/types/locatr.go (1)
LocatrMode(24-34)pkg/mode/visual-analysis.go (1)
VisualAnalysisMode(43-48)
pkg/mode/dom-analysis.go (2)
pkg/types/reranker.go (1)
RerankRequest(9-13)pkg/internal/utils/utils.go (1)
SortRerankChunks(118-138)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit (3.12, ubuntu-latest)
🔇 Additional comments (3)
pkg/mode/dom-analysis.go (1)
70-81: Reranker guard reads cleanThe conditional keeps the old behavior when a reranker is supplied and gracefully skips the call when it’s disabled. Looks good.
pkg/locatr.go (2)
37-62: Configuration toggle looks solidThe new
disableRerankerflag and option wire-up are straightforward and keep the config surface tidy. No concerns here.
124-145: Initialization & mode validation make senseNulling out the reranker when disabled and blocking the combination with
VisualAnalysisModekeeps the constructor’s invariants intact. Nicely scoped.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/plugins/raw-text.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/plugins/raw-text.go (3)
pkg/types/dom.go (3)
DOM(49-52)DOMMetadata(43-46)XPathType(10-10)pkg/internal/utils/utils.go (2)
ExtractFirstUniqueHTMLID(187-245)ExtractFirstUniqueXMLID(152-173)pkg/types/plugin.go (2)
Location(6-11)PluginInterface(14-40)
🔇 Additional comments (3)
pkg/plugins/raw-text.go (3)
88-108: Reject unknown platforms before minifying XML (and wrap failures).When
plugin.platformhas any value outside the enum,Str()returns"n/a", and we blindly hand that toxml.MinifySource/xml.CreateLocatorMap. Today this bubbles up as opaque parser failures and, depending on the implementation details, can even build an invalid locator map. Please fail fast on the unsupported platform and wrap the downstream errors with context. You’ll need to add afmtimport.func (plugin *rawTextPlugin) minifyXML() (*types.DOM, error) { pageSource := plugin.content platform := plugin.platform.Str() + if platform == invalidText { + return nil, fmt.Errorf("raw-text: unsupported platform %d", plugin.platform) + } + eSpec, err := xml.MinifySource(pageSource, platform) if err != nil { - return nil, err + return nil, fmt.Errorf("minify xml source: %w", err) } - locatrMap, err := xml.CreateLocatorMap(pageSource, platform) + locatorMap, err := xml.CreateLocatorMap(pageSource, platform) if err != nil { - return nil, err + return nil, fmt.Errorf("create xml locator map: %w", err) } dom := &types.DOM{ RootElement: eSpec, Metadata: &types.DOMMetadata{ LocatorType: types.XPathType, - LocatorMap: locatrMap, + LocatorMap: locatorMap, }, } return dom, nil }
110-115: Fail fast on empty content and invalid page types.Right now any non-HTML
PageTypesilently falls through to the XML path, so a bad enum value causes HTML payloads to be parsed as XML, producing confusing errors. We also try to minify empty/whitespace-only content, which just defers failure to the parser. Please guard on content first, then switch on the two supported page types. This needs astringsimport (and continues to rely onfmtfrom the previous change).func (plugin *rawTextPlugin) GetMinifiedDOM(ctx context.Context) (*types.DOM, error) { - if plugin.pageType == HTMLPageType { - return plugin.minifyHTML() - } - return plugin.minifyXML() + if strings.TrimSpace(plugin.content) == "" { + return nil, fmt.Errorf("raw-text: empty page source") + } + + switch plugin.pageType { + case HTMLPageType: + return plugin.minifyHTML() + case XMLPageType: + return plugin.minifyXML() + default: + return nil, fmt.Errorf("raw-text: unsupported page type %d", plugin.pageType) + } }
117-131: Apply the same explicit PageType handling to ID/locator helpers.
ExtractFirstUniqueIDandIsLocatorValidshare the same silent XML fall-through: an invalidPageType(or a future enum extension) routes HTML through XML utilities, yielding misleading errors. Mirror the guarded switch here so callers get a deterministic “unsupported page type” failure instead.func (plugin *rawTextPlugin) ExtractFirstUniqueID(ctx context.Context, fragment string) (string, error) { - if plugin.pageType == HTMLPageType { - return utils.ExtractFirstUniqueHTMLID(fragment) - } - return utils.ExtractFirstUniqueXMLID(fragment) + switch plugin.pageType { + case HTMLPageType: + return utils.ExtractFirstUniqueHTMLID(fragment) + case XMLPageType: + return utils.ExtractFirstUniqueXMLID(fragment) + default: + return "", fmt.Errorf("raw-text: unsupported page type %d", plugin.pageType) + } } func (plugin *rawTextPlugin) IsLocatorValid(ctx context.Context, locator string) (bool, error) { pageSource := plugin.content - if plugin.pageType == HTMLPageType { - return html.IsValidXPath(locator, pageSource) - } - return xml.IsValidXPath(locator, pageSource) + switch plugin.pageType { + case HTMLPageType: + return html.IsValidXPath(locator, pageSource) + case XMLPageType: + return xml.IsValidXPath(locator, pageSource) + default: + return false, fmt.Errorf("raw-text: unsupported page type %d", plugin.pageType) + } }
| var ( | ||
| ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evalutaion mode") | ||
| ) |
There was a problem hiding this comment.
Fix typo in exported error message.
The user-facing error string currently says “evalutaion”.
-var (
- ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evalutaion mode")
-)
+var (
+ ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evaluation mode")
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var ( | |
| ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evalutaion mode") | |
| ) | |
| var ( | |
| ErrModeNotSupported = errors.New("raw-text plugin only supports DOM evaluation mode") | |
| ) |
🤖 Prompt for AI Agents
In pkg/plugins/raw-text.go around lines 13 to 15, the exported error string
contains a typo ("evalutaion"); update the error message value to use the
correct spelling "evaluation" so it reads "raw-text plugin only supports DOM
evaluation mode".
String Only plugin
This PR adds an additional plugin for DOM text evaluation mode,
raw-text.This plugin allows users to add
DOMtext directly, without makingadditional calls to the corresponding automation platforms. Along with page source,
users also need to provide information on whether the page source is
htmlandxml,and if the platform is
android,ios, orweb. Depending on these values, thelogic for page source changes to fit the selected element.
Additions
raw-textplugin, that accepts page source when creation and provides workingimplementation for text only plugin that can be using with
DOMEvaluationmode.Limitations
For every page source, we tend to trim of elements that aren't visible. In case of
androidandiosall the style values that affect is provided in thexmlitself,however for
webplatform,htmlalone isn't the entire truth, there are caseswhere
cssmight have hidden the element. Since we don't have access to DOMAPI or the CSS to query, we evaluate just on the basis of HTML, which may not be
the entire truth
Summary by CodeRabbit
New Features
Chores