Skip to content

XML Array Proposal #660

@paul-manias

Description

@paul-manias

Parasol's has a universal data structure system based on XML. A significant issue is that XML doesn't have support for arrays, while data formats like JSON do. This proposal seeks to address that in a way that remains largely compatible with the existing XML, XSLT and XQuery specifications.

Integration with XQuery Type System

XQuery already has:

  • Sequences (flat, ordered collections)
  • Arrays (from XQuery 3.1, for nested structures)
  • Atomic types (xs:integer, xs:string, etc.)
  • Node types (element, attribute, etc.)

So we need array types that:

  1. Work with existing type checking
  2. Integrate with sequence/array operations
  3. Don't break schema validation
  4. Serialize/deserialize cleanly

Proposed Architecture

<!-- Array Type Registry (external file or inline in schema) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:arr="http://www.parasol.ws/xml/array-types"
           targetNamespace="http://example.com/data">

  <!-- Define array types as schema types -->
  <xs:simpleType name="intArray">
    <xs:annotation>
      <xs:appinfo>
        <arr:arrayType>
          <arr:itemType>xs:integer</arr:itemType>
          <arr:delimiter>,</arr:delimiter>
          <arr:allowEmpty>false</arr:allowEmpty>
        </arr:arrayType>
      </xs:appinfo>
    </xs:annotation>
    <xs:restriction base="xs:string">
      <!-- Pattern validates the serialized form -->
      <xs:pattern value="-?\d+(,-?\d+)*"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="stringArray">
    <xs:annotation>
      <xs:appinfo>
        <arr:arrayType>
          <arr:itemType>xs:string</arr:itemType>
          <arr:delimiter>|</arr:delimiter>
          <arr:escapeChar>\</arr:escapeChar>
        </arr:arrayType>
      </xs:appinfo>
    </xs:annotation>
    <xs:restriction base="xs:string"/>
  </xs:simpleType>

  <!-- Elements using these types -->
  <xs:element name="scores" type="intArray"/>
  <xs:element name="tags" type="stringArray"/>
  
</xs:schema>

XML Document Usage

<data xmlns="http://example.com/data"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <scores>95,87,92,88,100</scores>
  <tags>xml|xquery|xpath|xslt</tags>
  <weights>1.5,2.3,4.7,8.2</weights>
</data>

XQuery Integration - Automatic Conversion

The key is to make XQuery automatically recognize these array types and convert them:

(: Hypothetical XQuery 4.0 with array type support :)

declare namespace data = "http://example.com/data";
import schema namespace data = "http://example.com/data";

(: When you access an element with an array type, it automatically 
   converts to an XDM array :)
let $doc := doc("data.xml")
let $scores := $doc//data:scores  

(: $scores is now an array, not a text node! :)
return (
  $scores(1),              (: returns 95 as xs:integer :)
  array:size($scores),     (: returns 5 :)
  $scores?*,              (: returns sequence (95, 87, 92, 88, 100) :)
  sum($scores?*)          (: returns 462 :)
)

Type Checking Integration

(: Function expecting an array :)
declare function local:average($values as array(xs:integer)) as xs:double {
  avg($values?*)
};

(: This works - automatic conversion from intArray type :)
let $scores := doc("data.xml")//data:scores
return local:average($scores)

(: Type checking at compile time :)
declare function local:process-tags($tags as array(xs:string)) {
  for $tag in $tags?*
  return upper-case($tag)
};

(: This also works :)
let $tags := doc("data.xml")//data:tags
return local:process-tags($tags)

Bidirectional Conversion

(: Reading - automatic deserialization :)
let $scores := doc("data.xml")//data:scores
return $scores instance of array(xs:integer)  (: true :)

(: Writing - automatic serialization :)
let $newScores := array { 100, 95, 98, 92 }
return 
  <scores xsi:type="intArray">{
    (: Serializes to "100,95,98,92" :)
    $newScores
  }</scores>

Extended Type System

(: The type system recognizes the relationship :)

(: intArray is a subtype of array(xs:integer) :)
$scores instance of array(xs:integer)     (: true :)

(: But also retains its schema type :)
$scores instance of schema-element(data:scores)  (: true :)

(: You can explicitly get the serialized form if needed :)
fn:string($scores)  (: returns "95,87,92,88,100" :)

(: Or the array form :)
fn:data($scores)    (: returns array {95, 87, 92, 88, 100} :)

Advanced Registry with Inheritance

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:arr="http://www.w3.org/2005/xpath-array-types">

  <!-- Base array type -->
  <xs:simpleType name="baseArray">
    <xs:annotation>
      <xs:appinfo>
        <arr:arrayType>
          <arr:itemType>xs:anyAtomicType</arr:itemType>
          <arr:delimiter>,</arr:delimiter>
        </arr:arrayType>
      </xs:appinfo>
    </xs:annotation>
    <xs:restriction base="xs:string"/>
  </xs:simpleType>

  <!-- Derived types inherit delimiter -->
  <xs:simpleType name="intArray">
    <xs:restriction base="baseArray">
      <xs:annotation>
        <xs:appinfo>
          <arr:arrayType>
            <arr:itemType>xs:integer</arr:itemType>
            <!-- inherits delimiter="," from base -->
          </arr:arrayType>
        </xs:appinfo>
      </xs:annotation>
    </xs:restriction>
  </xs:simpleType>

  <!-- Override delimiter for specific type -->
  <xs:simpleType name="pipedStringArray">
    <xs:restriction base="baseArray">
      <xs:annotation>
        <xs:appinfo>
          <arr:arrayType>
            <arr:itemType>xs:string</arr:itemType>
            <arr:delimiter>|</arr:delimiter>
          </arr:arrayType>
        </xs:appinfo>
      </xs:annotation>
    </xs:restriction>
  </xs:simpleType>

</xs:schema>

Constructor Functions

(: XQuery would provide constructor functions :)

(: Parse from string using registered type :)
arr:intArray("10,20,30")  
  (: returns array {10, 20, 30} :)

(: Create with explicit delimiter :)
arr:parse("red|green|blue", "|", "xs:string")
  (: returns array {"red", "green", "blue"} :)

(: Serialize to registered format :)
arr:serialize(array{1,2,3}, "intArray")
  (: returns "1,2,3" :)

Validation Integration

(: Schema validation understands array types :)

validate type intArray { "10,20,30" }
  (: validates and returns array {10, 20, 30} :)

validate type intArray { "10,abc,30" }
  (: validation error: invalid item type :)

(: Strict mode validates each item :)
<scores xsi:type="intArray">95,87,92,abc,100</scores>
  (: Schema validation error at parse time :)

Static Type Inference

declare function local:process($data as element(data:scores)) {
  (: Compiler knows data:scores is intArray type :)
  (: So $data is inferred as array(xs:integer) :)
  let $max := max($data?*)  (: type-safe :)
  return $max
};

Implementation Considerations

1. Processor Behavior

  • Parser encounters element with array type annotation
  • Checks array type registry
  • Deserializes text content using specified delimiter/rules
  • Creates XDM array instance
  • Maintains schema type annotation

2. Backward Compatibility

  • Processors without array type support see plain text
  • Schema validation still works (validates string form)
  • XQuery 3.1 code continues to work

3. Performance

  • Lazy deserialization (only when accessed as array)
  • Cache deserialized form
  • Smart serialization (only when writing)

4. Namespace Integration

(: Array types live in schema namespace :)
import schema namespace data = "http://example.com/data";

(: But conversion functions in standard namespace :)
import module namespace arr = "http://www.w3.org/2005/xpath-array-types";

(: Seamless usage :)
let $scores := doc("data.xml")//data:scores
return array:size($scores)  (: works with standard array functions :)

Critical Design Principles

  1. Transparency: Array types behave like arrays in XQuery
  2. Type Safety: Schema validation ensures correctness
  3. Composability: Works with existing array/sequence operations
  4. Extensibility: Users can define custom array types
  5. Performance: Efficient serialization/deserialization
  6. Standards-Based: Builds on XML Schema and XDM

This approach treats array types as a bridge between serialized XML and XDM arrays, rather than a separate concept. The type system handles the conversion automatically, making it feel natural in XQuery while maintaining XML's text-based nature.

Would you want to explore specific aspects like error handling, multi-dimensional arrays, or integration with XQuery Update Facility?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions