Implement XQuery 4.0 parser, functions, and runtime support#6139
Implement XQuery 4.0 parser, functions, and runtime support#6139joewiz wants to merge 18 commits intoeXist-db:developfrom
Conversation
99bab01 to
59cca34
Compare
|
[This comment was co-authored with Claude Code. -Joe] XQuery 4.0 Functions Status (updated 2026-03-16)Implemented (19 of 27)
Remaining unimplemented (8 of 27)
Summary: 19 implemented (177 XQTS tests, many at 100%). 8 remaining: 1 partially unblocked, 2 schema-blocked, 4 JNode-blocked. |
fn:compare: XQ4 numeric/duration/dateTime total order via BigDecimal. fn:min/fn:max: fn:compare-based mutual comparability. fn:round 3-arg. fn:deep-equal: full XQ4 options engine, text node merging. fn:every/fn:some, fn:all-equal/different, fn:atomic-equal, fn:duplicate-values, fn:highest/fn:lowest, fn:scan-left/right, fn:contains/starts-with/ends-with-subsequence. Fix: SequenceComparator o2Count typo, AtomicValueComparator cause preservation, Collations instanceof for non-RuleBasedCollator, BigInteger comparison via string (not truncating getLong()). XQTS: fn-min +73, fn-max +73, fn-deep-equal +20, fn-every/some +50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
String: fn:characters, fn:graphemes (ICU4J), fn:char, fn:decode-from-uri, fn:insert-separator, fn:replicate Parsing: fn:parse-html (NekoHTML+XHTML), fn:parse-integer, fn:parse-QName, fn:parse-uri, fn:build-uri, fn:html-doc, fn:collation/-available Type: fn:atomic-type-annotation, fn:node-type-annotation, fn:type-of, fn:is-NaN, fn:identity, fn:void Nav: fn:transitive-closure, fn:element-to-map, fn:siblings, fn:in-scope-namespaces, fn:distinct/ordered-nodes Higher-order: fn:partition, fn:partial-apply, fn:sort-by, fn:op, fn:subsequence-where Numeric: fn:seconds, fn:divide-decimals, fn:unix-dateTime, fn:civil-timezone, fn:hash, fn:expanded-QName, fn:unparsed-binary Date: fn:build-dateTime, fn:parts-of-dateTime (record-compatible) Data: fn:items-at, fn:slice, fn:message, fn:highest, fn:lowest XQTS: fn-graphemes 1086/1189, fn-characters 45/45, misc-HtmlTestSuite 1105/1379, fn-unparsed-binary 14/15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
array:slice (4 overloads), array:index-where, array:sort-with, array:sort-by, array:empty, array:foot, array:trunk, array:items, array:members, array:build, array:index-of, array:of-members, array:split. Fix array:sort ClassCastException unwrap, ArraySortBy key validation, ArraySortWith RuntimeException unwrap. XQTS: array-slice 71/71, array-foot 9/9, array-trunk 6/6, array-items 8/8, math-cosh/sinh/tanh 27/27 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hyperbolic trigonometric functions via Java Math.cosh/sinh/tanh. Euler's number constant via Math.E. XQTS: math-cosh 9/9, math-sinh 9/9, math-tanh 9/9, math-e 4/5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unicode block name fallback (\p{Is<Block>} → \p{In<Block>}).
XQ4 fn:replace: 'c' flag, empty match, function replacement.
XQ4 fn:matches and fn:tokenize enhancements.
FunAnalyzeString: use reflection proxy for RegexIterator.MatchHandler
to avoid NoClassDefFoundError when the inner class is stripped from
fat JARs. Falls back to text-only output when unavailable.
XQTS: fn-matches.re +45, fn-replace +12, fn-tokenize +8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fractional seconds: left-aligned digit semantics. Word/Roman via ICU4J: W/w/Ww cardinal, Wo/wo/Wwo ordinal, I/i Roman. Timezone: picture-driven rewrite with digit family support. Era [E]/[C], calendar validation, grouping separators, optional digit validation, ordinal suffix teens fix, whitespace stripping, military TZ "J", name width truncation (max not min). XQTS: format-time 46→77/92, format-date 79→111/133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d-text, fn:json-doc Resolve relative URIs against file: base URI with direct file: handling. Only allow direct file: access for URIs resolved from relative paths (absolute file: URIs go through SourceFactory security checks). Separate FOJS0001 from FOUT1170 in fn:json-doc. Add iso-8859 → iso-8859-1 charset fallback in fn:unparsed-text. XQTS: misc-HtmlTestSuite 0→1105/1379, misc-JsonTestSuite 0→299/318 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fn:parse-csv, fn:csv-to-arrays, fn:csv-to-xml, fn:csv-to-json. Custom streaming CSV parser with configurable delimiter, quote char, header handling, and column naming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- fnXQuery40.xql: tests for 50+ new XQ4 functions - deep-equal-options-test.xq: deep-equal options engine tests - Re-enable arr:get-invalid-type (XPTY0004 now works) - Update json-to-xml pending comments - fn:replace test updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parser and tree walker extensions for XQ4: focus functions, keyword args, string templates, pipeline, mapping arrow, for member, otherwise, braced if, while, try/finally, ternary, QName/hex/binary literals, array/map filter, choice/union/enum types, method call, let destructure, fn() shorthand, record types, gnode(), 4 new axes, reservedKeywords sub-rules, expr split for code-too-large fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New expression classes: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType. Modified: Function (keyword arg resolution), FunctionFactory (XQ4 no-namespace override, unknown type XPST0017), FunctionSignature (default params), UserDefinedFunction (default param binding), TryCatchExpression (finally), SwitchExpression (XQ4 version gating), StringConstructor (atomization fixes), XQueryContext (version 4.0, XQST0060 relaxed, compileModuleFromSource), Constants (4 new axes), LocationStep (or-self axis evaluation with document node guard). Type infrastructure: Type.RECORD constant, SequenceType.RecordField, record type structural checking, record(*) and record() support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- convertTo(): FORG0001→XPTY0004 for type-incompatible casts (20 files) - DoubleValue: NaN/INF→integer/decimal throws FOCA0002 - DynamicCardinalityCheck: ERROR→XPTY0004 (or XPDY0050 for treat-as) - DynamicTypeCheck: FOCH0002→XPTY0004 (overridable for treat-as) - CastExpression: xs:anySimpleType→XPST0080 (was XPST0051) - StringValue: validation errors→FORG0001 (was generic ERROR) - Base64BinaryValueType: FORG0001 with proper ErrorCode - ErrorCodes: added convenience constructor XQTS impact: prod-CastExpr 745→141F, prod-TreatExpr 18→1F Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compile modules from provided source strings instead of loading from URIs. Required by misc-Subtyping XQTS tests (146 tests). Relaxed version compatibility check for content-loaded modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse invisible XML grammars using the Markup Blitz iXML library. Two signatures: fn:invisible-xml(grammar) returns a parsing function, and fn:invisible-xml(grammar, input) parses directly. Updated pom.xml with Markup Blitz dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Primitive long start/end instead of IntegerValue objects. Pre-computed size with overflow protection. O(1) count/isEmpty/contains. Prevents OOM on large ranges like 1 to 10000000000. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhanced: fn:compare (XQ4 anyAtomicType, total order), fn:min/max (comparison function), fn:deep-equal (options map), fn:matches/ fn:tokenize (XQ4 regex flags, ! flag version-gating), fn:replace (function replacement, ! flag), fn:round (3-arg mode). Collations: supplementary codepoint fix, ASCII case-insensitive collator. InspectModule: keyword arg introspection. DocUtils: URI resolution. Parameter name alignment across 59 fn: module files to match W3C XQuery 4.0 Functions and Operators catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fnXQuery40.xql with tests for all XQ4 features. Updated fnHigherOrderFunctions.xql, replace.xqm, fnLanguage.xqm, InspectModuleTest.java. New deep-equal-options-test.xq and fnInvisibleXml.xqm. Fixed stray backtick in Lucene facets.xql. Updated map ordering test assertions for LinkedHashMap insertion order. XQSuite: 1341 tests, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4a80095 to
0549468
Compare
|
[This comment was co-authored with Claude Code. -Joe] CI Status NotesSendEmailIT failure (macOS/ubuntu/windows integration): W3C XQTS CI failure: Unit tests: All pass (ubuntu). |
Grammar (XQuery.g): - fn() and function() type tests now accept named parameters: fn($name as xs:string, $age as xs:integer) as xs:boolean The names are parsed and discarded — only the sequence types matter for type checking. This matches the XQ4 spec. CastExpression/CastableExpression: - xs:anyType and xs:untyped now throw XPST0080 (was bypassing the abstract type check or using XPST0051) XQTS: misc-BuiltInKeywords 227→234 (+7 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Implements XQuery 4.0 parser and runtime support for eXist-db, covering the majority of the QT4CG specification draft syntax, 50+ new standard functions, enhanced existing functions, and W3C-compliant error codes. This brings eXist-db in line with the evolving XQuery 4.0 standard.
Based on the XQuery 4.0 Functions branch.
What Changed
1. Grammar — XQ4 syntax (XQuery.g + XQueryTree.g)
All major XQuery 4.0 syntax additions via ANTLR 2 grammar extensions:
fn { expr }name := expr`Hello {$name}`=>and mapping arrow=!>for member,whileclause,otherwise?? !!?[predicate]record(name as xs:string, age? as xs:integer, *)=?>, let destructuringfn(...)type shorthand,gnode()type test*-or-self,*-sibling-or-selfdeclare context value,xquery version "4.0"reservedKeywordssub-rules (merge-conflict reduction)exprrule split (code-too-large fix fornextbuilds)2. Expression classes (33 files)
New: FocusFunction, KeywordArgumentExpression, MappingArrowOperator, MethodCallOperator, PipelineExpression, OtherwiseExpression, WhileClause, ForMemberExpr, ForKeyValueExpr, LetDestructureExpr, FilterExprAM, ChoiceCast/CastableExpression, EnumCastExpression, FunctionParameterFunctionSequenceType.
Modified: Function, FunctionFactory, FunctionSignature, UserDefinedFunction, TryCatchExpression, SwitchExpression, StringConstructor, XQueryContext, Constants, LocationStep, SequenceType, Type.
3. Error code alignment (29 files)
convertTo()in 20 atomic typesDoubleValueNaN/INF castsDynamicCardinalityCheckDynamicTypeCheckTreatAsExpressionCastExpressionxs:anySimpleTypeFunctionFactoryunknown typesStringValuevalidationBase64BinaryValueType4. fn:load-xquery-module content option
XQ4
contentoption for dynamic module compilation from strings. Required by misc-Subtyping XQTS tests.5. fn:invisible-xml (Markup Blitz)
Parse invisible XML grammars using the Markup Blitz iXML library.
6. No-namespace function overriding (PR2200)
xquery version "4.0"allows declaring functions without namespace prefix, overriding fn: built-ins.7. RangeSequence optimization
Primitive long storage —
1 to 10000000000uses 24 bytes instead of OOM.8. Parameter name alignment (59 files)
W3C XQ4 catalog parameter names across fn: module for keyword argument support.
XQTS Results
QT4 XQTS results from run 22 (2026-03-16):
XQSuite: 1341 tests, 0 failures (across all test suites: 1676 tests, 0 failures)
Spec References
Limitations
Features not implemented: JNode data model, union node test syntax in axis steps, method calls (parsed but limited dispatch), version gating (XQ4 features available regardless of version declaration), XML Schema revalidation.
Test Plan
mvn teston CICo-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com