Skip to content

Implement XQuery 4.0 parser, functions, and runtime support#6139

Open
joewiz wants to merge 18 commits intoeXist-db:developfrom
joewiz:feature/xquery-4.0-parser
Open

Implement XQuery 4.0 parser, functions, and runtime support#6139
joewiz wants to merge 18 commits intoeXist-db:developfrom
joewiz:feature/xquery-4.0-parser

Conversation

@joewiz
Copy link
Member

@joewiz joewiz commented Mar 14, 2026

Summary

Implements XQuery 4.0 parser and runtime support for eXist-db, covering the majority of the QT4CG specification draft syntax, 50+ new standard functions, enhanced existing functions, and W3C-compliant error codes. This brings eXist-db in line with the evolving XQuery 4.0 standard.

Based on the XQuery 4.0 Functions branch.

What Changed

1. Grammar — XQ4 syntax (XQuery.g + XQueryTree.g)

All major XQuery 4.0 syntax additions via ANTLR 2 grammar extensions:

Feature Status
Focus functions: fn { expr } Complete
Keyword arguments: name := expr Complete
Default parameter values Complete
String templates: `Hello {$name}` Complete
Pipeline => and mapping arrow =!> Complete
for member, while clause, otherwise Complete
Braced if, try/catch/finally, ternary ?? !! Complete
QName literals, hex/binary literals, numeric underscores Complete
Array/map filter ?[predicate] Complete
Choice/union types, enumeration types Complete
Record types: record(name as xs:string, age? as xs:integer, *) Complete
Method call =?>, let destructuring Complete
fn(...) type shorthand, gnode() type test Complete
4 new axes: *-or-self, *-sibling-or-self Complete
declare context value, xquery version "4.0" Complete
reservedKeywords sub-rules (merge-conflict reduction) Complete
expr rule split (code-too-large fix for next builds) Complete

2. Expression classes (33 files)

New: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType.

Modified: Function, FunctionFactory, FunctionSignature, UserDefinedFunction, TryCatchExpression, SwitchExpression, StringConstructor, XQueryContext, Constants, LocationStep, SequenceType, Type.

3. Error code alignment (29 files)

Component Change XQTS Impact
convertTo() in 20 atomic types FORG0001 → XPTY0004 +510
DoubleValue NaN/INF casts → FOCA0002 +48
DynamicCardinalityCheck ERROR → XPTY0004/XPDY0050 +5
DynamicTypeCheck FOCH0002 → XPTY0004 +1
TreatAsExpression → XPDY0050 +17
CastExpression xs:anySimpleType → XPST0080 +2
FunctionFactory unknown types → XPST0017 +25
StringValue validation → FORG0001 +15
Base64BinaryValueType → FORG0001 +3

4. fn:load-xquery-module content option

XQ4 content option for dynamic module compilation from strings. Required by misc-Subtyping XQTS tests.

5. fn:invisible-xml (Markup Blitz)

Parse invisible XML grammars using the Markup Blitz iXML library.

6. No-namespace function overriding (PR2200)

xquery version "4.0" allows declaring functions without namespace prefix, overriding fn: built-ins.

7. RangeSequence optimization

Primitive long storage — 1 to 10000000000 uses 24 bytes instead of OOM.

8. Parameter name alignment (59 files)

W3C XQ4 catalog parameter names across fn: module for keyword argument support.

XQTS Results

QT4 XQTS results from run 22 (2026-03-16):

Test Set Tests Passed Failed Pass %
prod-CastExpr 2778 2618 141 94.2%
prod-TreatExpr 73 72 1 98.6%
prod-FunctionDecl 228 175 53 76.8%
prod-RecordTest 19 10 9 52.6%
prod-FLWORExpr 21 21 0 100%
prod-SwitchExpr 38 38 0 100%
prod-IfExpr 43 42 1 97.7%
prod-StringTemplate 53 52 1 98.1%
prod-ArrowExpr 70 67 3 95.7%
prod-TypeswitchExpr 74 72 2 97.3%
prod-TryCatchExpr 193 163 30 84.5%
prod-Lookup 131 116 13 88.5%
prod-InstanceofExpr 319 310 9 97.2%
prod-QuantifiedExpr 215 204 11 94.9%
prod-NamedFunctionRef 564 520 42 92.2%
prod-OrderByClause 206 204 1 99.0%
misc-BuiltInKeywords 293 227 63 77.5%
misc-Subtyping 153 54 58 35.3%
Axes (*-or-self) 130 116 14 89.2%

XQSuite: 1341 tests, 0 failures (across all test suites: 1676 tests, 0 failures)

Spec References

Limitations

Features not implemented: JNode data model, union node test syntax in axis steps, method calls (parsed but limited dispatch), version gating (XQ4 features available regardless of version declaration), XML Schema revalidation.

Test Plan

  • XQSuite: 1341 tests, 0 failures
  • Full test suites: 1676 tests, 0 failures (DateTests, MapTests, RegexTests, XQSuiteTests)
  • Full mvn test on CI
  • XQTS comparison against develop baseline

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@joewiz joewiz requested a review from a team as a code owner March 14, 2026 23:46
@joewiz joewiz added enhancement new features, suggestions, etc. xquery issue is related to xquery implementation labels Mar 14, 2026
@joewiz joewiz force-pushed the feature/xquery-4.0-parser branch from 99bab01 to 59cca34 Compare March 15, 2026 04:46
@joewiz
Copy link
Member Author

joewiz commented Mar 15, 2026

[This comment was co-authored with Claude Code. -Joe]

XQuery 4.0 Functions Status (updated 2026-03-16)

Implemented (19 of 27)

Function XQTS Tests Score Notes
fn:build-dateTime 0 No XQTS tests but spec-required
fn:collation 73 0/73 Implemented but tests use keyword args — will pass once keyword arg support is in the merged build
fn:collation-available 0 No XQTS tests
fn:html-doc 10 ~10 Uses NekoHTML parser
fn:invisible-xml 12 12/12 (100%) Uses Markup Blitz
fn:parts-of-dateTime 0 No XQTS tests
fn:unparsed-binary 21 14/15 (93%)
array:build 6 6/6 (100%)
array:empty 6 6/6 (100%)
array:foot 9 9/9 (100%)
array:index-of 13 13/13 (100%)
array:items 8 8/8 (100%)
array:members 6 6/6 (100%)
array:of-members 6 6/6 (100%)
array:split 11 11/11 (100%)
array:trunk 6 6/6 (100%)
math:cosh 9 9/9 (100%)
math:e 5 4/5 (80%) 1 parser-blocked
math:sinh 9 9/9 (100%)
math:tanh 9 9/9 (100%)

Remaining unimplemented (8 of 27)

Function Notes XQTS Tests Implementable?
fn:element-to-map-plan Element mapping plan 21 Partially — record types now exist
fn:schema-type Schema type information 14 No — needs XML Schema
fn:xsd-validator XSD validation 79 No — needs schema validation infrastructure
fn:jtree JNode tree navigation 26 No — needs JNode data model
fn:jkey JNode key access 14 No — needs JNode data model
fn:jvalue JNode value access 14 No — needs JNode data model
fn:jposition JNode position 13 No — needs JNode data model

Summary: 19 implemented (177 XQTS tests, many at 100%). 8 remaining: 1 partially unblocked, 2 schema-blocked, 4 JNode-blocked.

joewiz and others added 17 commits March 16, 2026 14:17
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal.
fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg.
fn:deep-equal: full XQ4 options engine, text node merging.
fn:every/fn:some, fn:all-equal/different, fn:atomic-equal,
fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right,
fn:contains/starts-with/ends-with-subsequence.

Fix: SequenceComparator o2Count typo, AtomicValueComparator cause
preservation, Collations instanceof for non-RuleBasedCollator,
BigInteger comparison via string (not truncating getLong()).

XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri,
  fn:insert-separator, fn:replicate
Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName,
  fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available
Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of,
  fn:is-NaN, fn:identity, fn:void
Nav: fn:transitive-closure, fn:element-to-map, fn:siblings,
  fn:in-scope-namespaces, fn:distinct/ordered-nodes
Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op,
  fn:subsequence-where
Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime,
  fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary
Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible)
Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest

XQTS: fn-graphemes 1086/1189, fn-characters 45/45,
  misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
array:slice (4 overloads), array:index-where, array:sort-with,
array:sort-by, array:empty, array:foot, array:trunk, array:items,
array:members, array:build, array:index-of, array:of-members,
array:split. Fix array:sort ClassCastException unwrap,
ArraySortBy key validation, ArraySortWith RuntimeException unwrap.

XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6,
  array-items 8/8, math-cosh/sinh/tanh 27/27

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh.
Euler's number constant via Math.E.

XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.

FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.

XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fractional seconds: left-aligned digit semantics.
Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman.
Timezone: picture-driven rewrite with digit family support.
Era [E]/[C], calendar validation, grouping separators, optional digit
validation, ordinal suffix teens fix, whitespace stripping, military
TZ "J", name width truncation (max not min).

XQTS: format-time 46→77/92, format-date 79→111/133

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-text, fn:json-doc

Resolve relative URIs against file: base URI with direct file: handling.
Only allow direct file: access for URIs resolved from relative paths
(absolute file: URIs go through SourceFactory security checks).
Separate FOJS0001 from FOUT1170 in fn:json-doc.
Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text.

XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json.
Custom streaming CSV parser with configurable delimiter, quote char,
header handling, and column naming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fnXQuery40.xql: tests for 50+ new XQ4 functions
- deep-equal-options-test.xq: deep-equal options engine tests
- Re-enable arr:get-invalid-type (XPTY0004 now works)
- Update json-to-xml pending comments
- fn:replace test updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parser and tree walker extensions for XQ4: focus functions, keyword
args, string templates, pipeline, mapping arrow, for member,
otherwise, braced if, while, try/finally, ternary, QName/hex/binary
literals, array/map filter, choice/union/enum types, method call, let
destructure, fn() shorthand, record types, gnode(), 4 new axes,
reservedKeywords sub-rules, expr split for code-too-large fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes: FocusFunction, KeywordArgumentExpression,
MappingArrowOperator, MethodCallOperator, PipelineExpression,
OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr,
LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression,
EnumCastExpression, FunctionParameterFunctionSequenceType.

Modified: Function (keyword arg resolution), FunctionFactory (XQ4
no-namespace override, unknown type XPST0017), FunctionSignature
(default params), UserDefinedFunction (default param binding),
TryCatchExpression (finally), SwitchExpression (XQ4 version gating),
StringConstructor (atomization fixes), XQueryContext (version 4.0,
XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes),
LocationStep (or-self axis evaluation with document node guard).

Type infrastructure: Type.RECORD constant, SequenceType.RecordField,
record type structural checking, record(*) and record() support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files)
- DoubleValue: NaN/INF→integer/decimal throws FOCA0002
- DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as)
- DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as)
- CastExpression: xs:anySimpleType→XPST0080 (was XPST0051)
- StringValue: validation errors→FORG0001 (was generic ERROR)
- Base64BinaryValueType: FORG0001 with proper ErrorCode
- ErrorCodes: added convenience constructor

XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compile modules from provided source strings instead of loading from
URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed
version compatibility check for content-loaded modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse invisible XML grammars using the Markup Blitz iXML library.
Two signatures: fn:invisible-xml(grammar) returns a parsing function,
and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml
with Markup Blitz dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Primitive long start/end instead of IntegerValue objects. Pre-computed
size with overflow protection. O(1) count/isEmpty/contains. Prevents
OOM on large ranges like 1 to 10000000000.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max
(comparison function), fn:deep-equal (options map), fn:matches/
fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace
(function replacement, ! flag), fn:round (3-arg mode). Collations:
supplementary codepoint fix, ASCII case-insensitive collator.
InspectModule: keyword arg introspection. DocUtils: URI resolution.

Parameter name alignment across 59 fn: module files to match W3C
XQuery 4.0 Functions and Operators catalog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fnXQuery40.xql with tests for all XQ4 features.
Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm,
InspectModuleTest.java. New deep-equal-options-test.xq and
fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql.
Updated map ordering test assertions for LinkedHashMap insertion order.

XQSuite: 1341 tests, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/xquery-4.0-parser branch from 4a80095 to 0549468 Compare March 16, 2026 18:51
@joewiz
Copy link
Member Author

joewiz commented Mar 17, 2026

[This comment was co-authored with Claude Code. -Joe]

CI Status Notes

SendEmailIT failure (macOS/ubuntu/windows integration):
sendTextAndHtmlEmailWithXmlAndBinaryAttachments[SMTP_DIRECT_CONNECTION NOT_AUTHENTICATED] fails with XPST0003 !(inElementContent || inAttributeContent) — a lexer state assertion. This test passes consistently on local runs (tested 3x). The assertion predicate exists on develop too (line 3051 of XQuery.g) and predates our changes. Appears to be a CI-environment-specific flaky failure, possibly timing-related.

W3C XQTS CI failure:
Expected — the XQTS runner times out on the full QT4 test suite within CI's time limits. Not a code issue.

Unit tests: All pass (ubuntu).

Grammar (XQuery.g):
- fn() and function() type tests now accept named parameters:
  fn($name as xs:string, $age as xs:integer) as xs:boolean
  The names are parsed and discarded — only the sequence types matter
  for type checking. This matches the XQ4 spec.

CastExpression/CastableExpression:
- xs:anyType and xs:untyped now throw XPST0080 (was bypassing the
  abstract type check or using XPST0051)

XQTS: misc-BuiltInKeywords 227→234 (+7 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz marked this pull request as ready for review March 17, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement new features, suggestions, etc. xquery issue is related to xquery implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant