-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Parasol's has a universal data structure system based on XML. A significant issue is that XML doesn't have support for arrays, while data formats like JSON do. This proposal seeks to address that in a way that remains largely compatible with the existing XML, XSLT and XQuery specifications.
Integration with XQuery Type System
XQuery already has:
- Sequences (flat, ordered collections)
- Arrays (from XQuery 3.1, for nested structures)
- Atomic types (xs:integer, xs:string, etc.)
- Node types (element, attribute, etc.)
So we need array types that:
- Work with existing type checking
- Integrate with sequence/array operations
- Don't break schema validation
- Serialize/deserialize cleanly
Proposed Architecture
<!-- Array Type Registry (external file or inline in schema) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:arr="http://www.parasol.ws/xml/array-types"
targetNamespace="http://example.com/data">
<!-- Define array types as schema types -->
<xs:simpleType name="intArray">
<xs:annotation>
<xs:appinfo>
<arr:arrayType>
<arr:itemType>xs:integer</arr:itemType>
<arr:delimiter>,</arr:delimiter>
<arr:allowEmpty>false</arr:allowEmpty>
</arr:arrayType>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string">
<!-- Pattern validates the serialized form -->
<xs:pattern value="-?\d+(,-?\d+)*"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="stringArray">
<xs:annotation>
<xs:appinfo>
<arr:arrayType>
<arr:itemType>xs:string</arr:itemType>
<arr:delimiter>|</arr:delimiter>
<arr:escapeChar>\</arr:escapeChar>
</arr:arrayType>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string"/>
</xs:simpleType>
<!-- Elements using these types -->
<xs:element name="scores" type="intArray"/>
<xs:element name="tags" type="stringArray"/>
</xs:schema>XML Document Usage
<data xmlns="http://example.com/data"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<scores>95,87,92,88,100</scores>
<tags>xml|xquery|xpath|xslt</tags>
<weights>1.5,2.3,4.7,8.2</weights>
</data>XQuery Integration - Automatic Conversion
The key is to make XQuery automatically recognize these array types and convert them:
(: Hypothetical XQuery 4.0 with array type support :)
declare namespace data = "http://example.com/data";
import schema namespace data = "http://example.com/data";
(: When you access an element with an array type, it automatically
converts to an XDM array :)
let $doc := doc("data.xml")
let $scores := $doc//data:scores
(: $scores is now an array, not a text node! :)
return (
$scores(1), (: returns 95 as xs:integer :)
array:size($scores), (: returns 5 :)
$scores?*, (: returns sequence (95, 87, 92, 88, 100) :)
sum($scores?*) (: returns 462 :)
)Type Checking Integration
(: Function expecting an array :)
declare function local:average($values as array(xs:integer)) as xs:double {
avg($values?*)
};
(: This works - automatic conversion from intArray type :)
let $scores := doc("data.xml")//data:scores
return local:average($scores)
(: Type checking at compile time :)
declare function local:process-tags($tags as array(xs:string)) {
for $tag in $tags?*
return upper-case($tag)
};
(: This also works :)
let $tags := doc("data.xml")//data:tags
return local:process-tags($tags)Bidirectional Conversion
(: Reading - automatic deserialization :)
let $scores := doc("data.xml")//data:scores
return $scores instance of array(xs:integer) (: true :)
(: Writing - automatic serialization :)
let $newScores := array { 100, 95, 98, 92 }
return
<scores xsi:type="intArray">{
(: Serializes to "100,95,98,92" :)
$newScores
}</scores>Extended Type System
(: The type system recognizes the relationship :)
(: intArray is a subtype of array(xs:integer) :)
$scores instance of array(xs:integer) (: true :)
(: But also retains its schema type :)
$scores instance of schema-element(data:scores) (: true :)
(: You can explicitly get the serialized form if needed :)
fn:string($scores) (: returns "95,87,92,88,100" :)
(: Or the array form :)
fn:data($scores) (: returns array {95, 87, 92, 88, 100} :)Advanced Registry with Inheritance
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:arr="http://www.w3.org/2005/xpath-array-types">
<!-- Base array type -->
<xs:simpleType name="baseArray">
<xs:annotation>
<xs:appinfo>
<arr:arrayType>
<arr:itemType>xs:anyAtomicType</arr:itemType>
<arr:delimiter>,</arr:delimiter>
</arr:arrayType>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string"/>
</xs:simpleType>
<!-- Derived types inherit delimiter -->
<xs:simpleType name="intArray">
<xs:restriction base="baseArray">
<xs:annotation>
<xs:appinfo>
<arr:arrayType>
<arr:itemType>xs:integer</arr:itemType>
<!-- inherits delimiter="," from base -->
</arr:arrayType>
</xs:appinfo>
</xs:annotation>
</xs:restriction>
</xs:simpleType>
<!-- Override delimiter for specific type -->
<xs:simpleType name="pipedStringArray">
<xs:restriction base="baseArray">
<xs:annotation>
<xs:appinfo>
<arr:arrayType>
<arr:itemType>xs:string</arr:itemType>
<arr:delimiter>|</arr:delimiter>
</arr:arrayType>
</xs:appinfo>
</xs:annotation>
</xs:restriction>
</xs:simpleType>
</xs:schema>Constructor Functions
(: XQuery would provide constructor functions :)
(: Parse from string using registered type :)
arr:intArray("10,20,30")
(: returns array {10, 20, 30} :)
(: Create with explicit delimiter :)
arr:parse("red|green|blue", "|", "xs:string")
(: returns array {"red", "green", "blue"} :)
(: Serialize to registered format :)
arr:serialize(array{1,2,3}, "intArray")
(: returns "1,2,3" :)Validation Integration
(: Schema validation understands array types :)
validate type intArray { "10,20,30" }
(: validates and returns array {10, 20, 30} :)
validate type intArray { "10,abc,30" }
(: validation error: invalid item type :)
(: Strict mode validates each item :)
<scores xsi:type="intArray">95,87,92,abc,100</scores>
(: Schema validation error at parse time :)Static Type Inference
declare function local:process($data as element(data:scores)) {
(: Compiler knows data:scores is intArray type :)
(: So $data is inferred as array(xs:integer) :)
let $max := max($data?*) (: type-safe :)
return $max
};Implementation Considerations
1. Processor Behavior
- Parser encounters element with array type annotation
- Checks array type registry
- Deserializes text content using specified delimiter/rules
- Creates XDM array instance
- Maintains schema type annotation
2. Backward Compatibility
- Processors without array type support see plain text
- Schema validation still works (validates string form)
- XQuery 3.1 code continues to work
3. Performance
- Lazy deserialization (only when accessed as array)
- Cache deserialized form
- Smart serialization (only when writing)
4. Namespace Integration
(: Array types live in schema namespace :)
import schema namespace data = "http://example.com/data";
(: But conversion functions in standard namespace :)
import module namespace arr = "http://www.w3.org/2005/xpath-array-types";
(: Seamless usage :)
let $scores := doc("data.xml")//data:scores
return array:size($scores) (: works with standard array functions :)Critical Design Principles
- Transparency: Array types behave like arrays in XQuery
- Type Safety: Schema validation ensures correctness
- Composability: Works with existing array/sequence operations
- Extensibility: Users can define custom array types
- Performance: Efficient serialization/deserialization
- Standards-Based: Builds on XML Schema and XDM
This approach treats array types as a bridge between serialized XML and XDM arrays, rather than a separate concept. The type system handles the conversion automatically, making it feel natural in XQuery while maintaining XML's text-based nature.
Would you want to explore specific aspects like error handling, multi-dimensional arrays, or integration with XQuery Update Facility?