-
Notifications
You must be signed in to change notification settings - Fork 563
DQL README — DOM Query Language
DQL is a universal, expressive query language for structured and tree-shaped data, built for XGo. It provides a unified query interface across JSON, XML, HTML, ASTs, file systems, and any other domain that can be modeled as a tree.
- Overview
- Supported Domains
- Core Concepts
- Syntax Reference
- Query Examples
- Error Handling
- Performance and Caching
- Implementing a NodeSet
- Standard Errors
DQL brings a concise, chainable query syntax to XGo. Think of it as XPath or CSS selectors — but designed for XGo's type system, with first-class support for lazy evaluation, error propagation, and domain-specific adaptations.
// Find all hyperlinks in an HTML document
for a in doc.**.a {
if url := a.$href; url != "" {
echo url
}
}
// Filter JSON records
for animal in doc.animals.*@($class == "zebra") {
echo animal.$at
}
// Walk the file system
for e in fs`.`.**.file.match("*.xgo") {
echo e.path
}
DQL is not tied to any single format. Any data that can be represented as a tree can be queried with DQL:
- JSON — query fields, arrays, and nested objects
- YAML — same model as JSON
- HTML / XML — traverse elements, attributes, and text
- Go / XGo AST — inspect and filter syntax trees
- File System — walk directories, match files by name or extension
- Any custom tree structure — implement the NodeSet interface
Everything in DQL is a NodeSet. Queries consume a NodeSet and produce a new NodeSet. The entry point is typically the root document node, called doc.
doc NodeSet // root of the query
NodeSets are lazily evaluated — they describe a query plan that is executed only when you actually iterate or read data from them. This keeps memory usage low and allows complex query composition without overhead.
Navigate the tree with dot notation:
ns.name // direct child named "name"
ns."elem-name" // child with a hyphenated name
ns.* // all direct children (wildcard)
ns.**.name // all descendants named "name" (deep query)
ns.**.* // all descendants
ns[0] // first element of an array/list node (0-based)
Filter nodes using @:
ns@name // keep only nodes named "name"
ns@(condExpr) // keep nodes matching a boolean expression
ns@fn(args) // keep nodes where fn(...) returns true
Examples:
doc.users.*@($role == "admin") // users whose role attribute is "admin"
doc.users.*@($age.int?:100 < 18) // users under 18
doc.**.div@hasClass("widget") // all divs with a given CSS class
Read node-level data with $:
val := ns.$name // single-value: zero value on error
val, err := ns.$name // dual-value: explicit error handling
val := ns.$name! // panic on error
val := ns.$name?:"default" // use a fallback value on error
val := ns.$"attr-name" // hyphenated attribute names
Attribute access always operates on the first node in the NodeSet. To collect an attribute from every node, use a list comprehension:
names := [user.$name for user in doc.users.*]
Methods provide access to computed or typed data:
text := div.text // call Text() method (single-value)
text, err := div.text // dual-value
count := div.count?:0 // explicit default for numeric methods
age, err := node.$age.int // numeric methods require dual-value
Methods prefixed with _ map to XGo_-prefixed implementations:
node._text // → node.XGo_text()
node._count // → node.XGo_count()
node._first // → node.XGo_first()
Note on numeric methods: Methods that return numeric types (like
int,float,count) do not have a single-value form, to prevent silent bugs where a zero default masks a real error. Always use the dual-value form or an explicit?:default.
| Syntax | Meaning |
|---|---|
ns.name |
Direct child named name
|
ns."elem-name" |
Direct child with special characters |
ns.* |
All direct children |
ns[n] |
Child at index n (0-based) |
ns.**.name |
Deep search for name
|
ns.**.* |
All descendants |
ns@name |
Filter: keep nodes named name
|
ns@(expr) |
Filter: keep nodes matching expression |
ns.$name |
Attribute (single-value, zero on error) |
val, err := ns.$name |
Attribute (dual-value) |
ns.$name! |
Attribute, panic on error |
ns.$name?:def |
Attribute, custom default on error |
ns.method(args) |
Method call |
ns._method |
Method call via XGo_method()
|
ns.all / ns._all
|
Materialize and cache all results |
ns.one / ns._one
|
First match, early termination |
ns.single / ns._single
|
Exactly one match (validates uniqueness) |
doc := json`{
"animals": [
{"class": "gopher", "at": "Line 1"},
{"class": "armadillo", "at": "Line 2"},
{"class": "zebra", "at": "Line 3"},
{"class": "unknown", "at": "Line 4"},
{"class": "gopher", "at": "Line 5"},
{"class": "bee", "at": "Line 6"},
{"class": "gopher", "at": "Line 7"},
{"class": "zebra", "at": "Line 8"}
]
}`!
// Iterate filtered records
for animal in doc.animals.*@($class == "zebra") {
echo animal.$at
}
// Collect all names into a list
names := [a.$class for a in doc.animals.*]
// Access by index
first := doc.animals[0].$class
// Find unique admin (validate exactly one exists)
user := doc.users.*@($role == "admin")._single
name, err := user.$name
import "os"
import "github.com/goplus/xgo/dql/html"
doc := html.source(os.Args[1])
// Print all hyperlink URLs
for a in doc.**.a {
if url := a.$href; url != "" {
echo url
}
}
// Collect text from all paragraphs
texts := [p.text for p in doc.**.p]
// Find the first element with a specific class (early termination)
widget := doc.**.*@($class == "widget").one
id := widget.$id
doc := xgo`
x, y := "Hi", 123
echo x
print y
`!
// Find all expression statements
stmts := doc.shadowEntry.body.list.*@(self.class == "ExprStmt")
// Extract function names from call expressions
for fn in stmts.x@(self.class == "CallExpr").fun@(self.class == "Ident") {
echo fn.$name
}
// Walk current directory and print all .xgo files
for e in fs`.`.**.file.match("*.xgo") {
echo e.path
}
// Collect all Go source file names
names := [f.$name for f in root.**.file@match("*.go", $name)]
DQL integrates with XGo's error handling operators and follows a consistent two-version design for attribute and method access.
Most accessors come in two forms:
// Single-value: convenient, returns zero on error
name := node.$name
// Dual-value: explicit, lets you inspect the error
name, err := node.$name
Additionally, XGo's error operators work directly on DQL expressions:
name := node.$name! // panic if error
name := node.$name?:"N/A" // custom fallback
Some operations return a NodeSet that internally carries an error (e.g., _one when no match is found, _single when zero or multiple matches are found). This error propagates automatically to any subsequent attribute or method access:
admin := doc.users.*@($role == "admin")._one
// All of these reflect the internal ErrNotFound:
name := admin.$name // returns zero value
name, err := admin.$name // err == dql.ErrNotFound
name := admin.$name! // panics
An empty NodeSet is still a valid NodeSet — loops simply don't execute:
for user in admin {
// Not reached if admin has ErrNotFound
}
DQL uses lazy evaluation by default. Queries are not executed until you iterate or read from the NodeSet. This is memory-efficient and enables query composition, but means a NodeSet re-executes its query each time it is accessed.
Use cache control methods to avoid repeated execution:
Executes the query and caches all results. Use when you need to access the same NodeSet multiple times.
users := doc.users.*@($active == true)._all
names := [u.$name for u in users] // uses cache
emails := [u.$email for u in users] // uses cache, no re-query
Stops after finding the first matching node. Use when you know (or assume) there is at most one result.
admin := doc.users.*@($role == "admin")._one
name := admin.$name // ErrNotFound if no match
Validates that exactly one node matches. Returns ErrMultipleEntities if more than one is found.
user := doc.users.*@($id == 12345)._single
name, err := user.$name
// err may be ErrNotFound or ErrMultipleEntities
| Need | Use |
|---|---|
| Multiple accesses to same results | _all |
| Expect one result, want fast exit | _one |
| Require exactly one result | _single |
| Single-pass iteration | no cache (default lazy) |
To make a custom data source queryable with DQL, implement the following interface on your NodeSet type.
// Iteration
func (ns NodeSet) XGo_Enum() iter.Seq[NodeSet]
// Extract first node
func (ns NodeSet) XGo_first() (NodeType, error)
// Type conversion from raw sequence
func NodeSet_Cast(seq func(func(NodeType) bool)) NodeSet
func (ns NodeSet) XGo_Elem(name string) NodeSet // ns.name
func (ns NodeSet) XGo_Child() NodeSet // ns.*
func (ns NodeSet) XGo_Index(index int) NodeSet // ns[n]
func (ns NodeSet) XGo_Any(name string) NodeSet // ns.**.name ("" = all)
func (ns NodeSet) XGo_Select(name string) NodeSet // ns@name
// Conditional filters are compiler-generated using NodeSet_Cast
func (ns NodeSet) XGo_Attr__0(name string) ValueType // single-value
func (ns NodeSet) XGo_Attr__1(name string) (ValueType, error) // dual-value
Provide either general-form (XGo_all, XGo_one, XGo_single) or domain-specific form (All, One, Single) methods:
func (ns NodeSet) XGo_all() NodeSet // or All()
func (ns NodeSet) XGo_one() NodeSet // or One()
func (ns NodeSet) XGo_single() NodeSet // or Single()
Domain types (NodeType, ValueType, IndexType) are customized per implementation.
The dql package defines two standard sentinel errors:
package dql
var (
ErrNotFound = errors.New("node not found")
ErrMultipleEntities = errors.New("multiple entities found, expected single")
)
These are returned by _one (when no match exists) and _single (when zero or more than one match exists), and propagate through the NodeSet to any subsequent attribute or method call.