Skip to content

Add EXPath File Module 4.0 alongside native file module#6084

Open
joewiz wants to merge 4 commits intoeXist-db:developfrom
joewiz:feature/expath-file-module
Open

Add EXPath File Module 4.0 alongside native file module#6084
joewiz wants to merge 4 commits intoeXist-db:developfrom
joewiz:feature/expath-file-module

Conversation

@joewiz
Copy link
Member

@joewiz joewiz commented Mar 3, 2026

Summary

Add a spec-compliant EXPath File Module 4.0 implementation alongside eXist's original file module. Both modules are available simultaneously — the original is unchanged, and the EXPath module adds W3C-aligned file operations with interoperability across XQuery processors (Saxon, BaseX, etc.).

  • file:* — eXist's original file module (http://exist-db.org/xquery/file) — unchanged, fully backward compatible
  • exfile:* — EXPath File 4.0 module (http://expath.org/ns/file) — new, W3C-aligned, 96% XQTS compliance
  • file:sync remains in the original module; util:file-sync is also available as a convenience alias

Prefix convention and migration timeline

The EXPath module uses the default prefix exfile: to avoid conflicts with the existing file: prefix. Explicit imports always work regardless of default prefix conventions:

(: Use the EXPath module with any prefix you want :)
import module namespace file = "http://expath.org/ns/file";
file:read-text("/path/to/file.txt")

(: The original module also continues to work with explicit imports :)
import module namespace file = "http://exist-db.org/xquery/file";
file:read("/path/to/file.txt")
Release Strategy
7.x (this PR) Coexistence — original file:, EXPath exfile:
8.0 Deprecate original file module
9.0 Swap default prefixes — EXPath gets file:, original gets file-legacy:

Details

New EXPath File Module (10 Java classes in extensions/expath/src/main/java/org/expath/exist/file/):

Class Functions
FileProperties exists, is-dir, is-file, is-absolute, last-modified, size
FileIO read-text, read-text-lines, read-binary (with offset/length)
FileWrite write, write-text, write-text-lines, write-binary (with offset)
FileAppend append, append-binary, append-text, append-text-lines
FileManipulation copy, move, delete (recursive), create-dir, create-temp-dir, create-temp-file, list (with glob), children, descendants, list-roots
FilePaths name, parent, path-to-native, path-to-uri, resolve-path
FileSystemProperties dir-separator, line-separator, path-separator, temp-dir, base-dir, current-dir
ExpathFileErrorCode 9 spec error codes: not-found, invalid-path, exists, no-dir, is-dir, is-relative, unknown-encoding, out-of-range, io-error
ExpathFileModuleHelper DBA check, path resolution (file: URIs + native + static base URI), encoding validation
ExpathFileModule Module registration (48 function signatures)

Security: all EXPath functions require DBA role. Every function in the EXPath module — including system properties (exfile:dir-separator, exfile:temp-dir, etc.) — requires the calling user to have the DBA role, preventing unauthenticated users from learning anything about the server's filesystem.

Key capabilities beyond the original module:

  • Spec-compliant error codes (exfile:not-found, exfile:no-dir, etc.)
  • exfile:read-text normalizes newlines (CR/CRLF → LF) per spec
  • exfile:read-text/exfile:read-text-lines detect XML-illegal characters and raise exfile:io-error; the 3-arg $fallback=true() form replaces them with U+FFFD
  • exfile:read-binary supports $offset/$length parameters
  • exfile:delete supports $recursive parameter
  • Relative paths resolve against the XQuery static base URI when set as a file: URI
  • New functions not in original: exfile:is-absolute, exfile:children, exfile:descendants, exfile:list-roots, exfile:create-temp-dir, exfile:create-temp-file, exfile:write (serialized), exfile:append, exfile:write-text-lines, exfile:append-text-lines, exfile:append-binary

XQTS Results

QT4 XQTS expath-file test set: 183/190 (96.3%) — 0 errors, 0 hangs

Metric Value
Tests 190
Pass 183 (96.3%)
Fail 7
Errors 0
Hangs 0

Remaining 7 failures

Test Issue Category
readTextLines1-002 CR character (
) in XML test catalog normalized to LF by XML parser before XQuery engine sees it Runner/XML spec
readTextLines1-005 Same CR normalization issue Runner/XML spec
readTextLines1-007 Same CR normalization issue with mixed separators Runner/XML spec
exists-002 exfile:exists("../sandpit") — sandpit copied to temp dir Runner sandpit design
pathToNative-002 exfile:path-to-native("//test.txt") — platform-specific (BaseX also fails identically) Spec edge case
copy-005 Runner doesn't propagate static base URI to assertion evaluation Runner
append2-001 Same runner assertion context issue Runner

All 7 failures are external to the eXist-db implementation.

readBinary implementation note

exfile:read-binary reads the file into a byte[] via Files.readAllBytes() (or RandomAccessFile.readFully() for partial reads), base64-encodes it, and wraps it in BinaryValueFromBinaryString — a lightweight value type with no open file handles and a no-op close().

The previous stream-backed approach (BinaryValueFromInputStream) used eXist-db's CachingFilterInputStream/FilterInputStreamCacheMonitor infrastructure, which prevented clean BrokerPool shutdown and caused deadlocks in the XQTS runner. The new approach trades ~2.4x memory for file content (raw bytes + base64 string) for zero resource leak risk — appropriate for the typical use cases of the EXPath File Module. Applications processing very large binary files should use streaming APIs (e.g., EXPath Binary Module) rather than loading entire files into XDM values.

Test plan

  • 64 XQSuite integration tests for the EXPath file module (exfile: prefix)
  • 44 sync tests pass (22 sync + 8 sync-serialize + 14 syncmod, minus 2 %test:pending)
  • All 108 EXPath tests pass together via ExpathFileTests runner
  • Original file module: 63 tests, 0 failures (fully backward compatible, unchanged)
  • QT4 XQTS expath-file test set: 183/190 (96.3%), 0 hangs
  • Manual verification: import module namespace exfile="http://expath.org/ns/file"; exfile:exists("/tmp")

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner March 3, 2026 12:53
@joewiz joewiz marked this pull request as draft March 3, 2026 12:53
@joewiz joewiz marked this pull request as ready for review March 4, 2026 19:50
@joewiz joewiz force-pushed the feature/expath-file-module branch 2 times, most recently from b97bb5a to 5c5bccc Compare March 5, 2026 03:44
@joewiz
Copy link
Member Author

joewiz commented Mar 5, 2026

Migration guide: eXist-db native file: module → EXPath File Module 4.0

This PR replaces the eXist-specific file module (http://exist-db.org/xquery/file) with the spec-compliant EXPath File Module 4.0 (http://expath.org/ns/file). The following crosswalk is intended to help users migrate existing code.

Key behavioral differences:

  • Namespace change: http://exist-db.org/xquery/filehttp://expath.org/ns/file. Both use the file: prefix, so most call sites only need the import changed.
  • Return types: The old module returned xs:boolean for write/delete/move operations; EXPath returns empty-sequence() and raises typed error codes (e.g., file:not-found, file:is-dir, file:io-error) on failure.
  • Parameter types: The old module accepted item() for paths (strings or URIs); EXPath accepts xs:string.
  • file:sync relocated: The eXist-specific sync function is now util:file-sync in http://exist-db.org/xquery/util.

Crosswalk: old functions → new equivalents

Old file: function (http://exist-db.org/xquery/file) New EXPath file: equivalent (http://expath.org/ns/file) Notes
file:exists($path)xs:boolean file:exists($path) Parameter changes from item() to xs:string
file:is-directory($path)xs:boolean file:is-dir($path) Renamed
file:is-readable($path)xs:boolean (no equivalent) Not in EXPath spec
file:is-writeable($path)xs:boolean (no equivalent) Not in EXPath spec
file:read($path)xs:string? file:read-text($path) Renamed; EXPath normalizes newlines (CR/CRLF → LF) per spec
file:read($path, $enc)xs:string? file:read-text($path, $enc) Renamed
file:read-unicode($path)xs:string? file:read-text($path) Old function stripped BOM; EXPath read-text handles encoding transparently
file:read-unicode($path, $enc)xs:string? file:read-text($path, $enc) Same as above
file:read-binary($path)xs:base64Binary? file:read-binary($path) Same name; EXPath adds optional $offset/$length overloads
file:serialize($nodes, $path, $params)xs:boolean? file:write($path, $nodes, $params) Renamed; parameter order changed (path first); returns empty-sequence() — errors raise exceptions
file:serialize($nodes, $path, $params, $append)xs:boolean? file:append($path, $nodes, $params) Use file:append instead of the $append flag
file:serialize-binary($data, $path)xs:boolean file:write-binary($path, $data) Renamed; parameter order changed; returns empty-sequence()
file:serialize-binary($data, $path, $append)xs:boolean file:append-binary($path, $data) Use file:append-binary instead of the $append flag
file:delete($path)xs:boolean file:delete($path) Returns empty-sequence(); EXPath adds optional $recursive overload
file:move($source, $dest)xs:boolean file:move($source, $target) Returns empty-sequence()
file:mkdir($path)xs:boolean file:create-dir($path) Renamed; EXPath always creates parent dirs (like old file:mkdirs)
file:mkdirs($path)xs:boolean file:create-dir($path) Renamed; same behavior
file:list($path)node()* file:list($path) Different return type: old returned XML elements, EXPath returns xs:string* relative paths with trailing separator on dirs
file:directory-list($path, $pattern)node()? file:list($path, true(), $pattern) Use 3-argument file:list with $recursive := true(). Returns xs:string* instead of XML
file:sync($collection, $target, $options)document-node() util:file-sync($collection, $target, $options) Moved to util module (http://exist-db.org/xquery/util); same signature and behavior

New functions with no old equivalent

New EXPath file: function Description
file:is-file($path as xs:string) as xs:boolean Tests whether a path points to a regular file
file:is-absolute($path as xs:string) as xs:boolean Tests whether a path is absolute
file:last-modified($path as xs:string) as xs:dateTime Returns the last modification time of a file or directory
file:size($path as xs:string) as xs:integer Returns the byte size of a file, or 0 for a directory
file:size($path, $recursive as xs:boolean?) as xs:integer Returns recursive size if $recursive is true
file:read-text-lines($file as xs:string) as xs:string* Reads file contents as a sequence of lines (default UTF-8)
file:read-text-lines($file, $encoding as xs:string?) as xs:string* Reads file as lines with specified encoding
file:write-text($file as xs:string, $value as xs:string) as empty-sequence() Writes a string to a file
file:write-text($file, $value, $encoding as xs:string?) as empty-sequence() Writes a string with specified encoding
file:write-text-lines($file as xs:string, $values as xs:string*) as empty-sequence() Writes strings as lines separated by platform line separator
file:write-text-lines($file, $values, $encoding as xs:string?) as empty-sequence() Writes lines with specified encoding
file:append-text($file as xs:string, $value as xs:string) as empty-sequence() Appends a string to a file
file:append-text($file, $value, $encoding as xs:string?) as empty-sequence() Appends a string with specified encoding
file:append-text-lines($file as xs:string, $lines as xs:string*) as empty-sequence() Appends strings as lines
file:append-text-lines($file, $lines, $encoding as xs:string?) as empty-sequence() Appends lines with specified encoding
file:copy($source as xs:string, $target as xs:string) as empty-sequence() Copies a file or directory (overwrites target if it exists)
file:create-temp-dir($prefix as xs:string?, $suffix as xs:string?, $dir as xs:string?) as xs:string Creates a temporary directory
file:create-temp-file($prefix as xs:string?, $suffix as xs:string?, $dir as xs:string?) as xs:string Creates a temporary file
file:children($path as xs:string) as xs:string* Returns absolute paths of immediate children of a directory
file:descendants($path as xs:string) as xs:string* Returns absolute paths of all descendants recursively
file:list-roots() as xs:string* Returns the root directories of the file system
file:name($path as xs:string) as xs:string Returns the name (last segment) of a path
file:parent($path as xs:string) as xs:string? Returns the parent directory of a path
file:path-to-native($path as xs:string) as xs:string Returns the native, canonical path
file:path-to-uri($path as xs:string) as xs:anyURI Returns the path as a file:// URI
file:resolve-path($path as xs:string) as xs:string Resolves a relative path against the current working directory
file:resolve-path($path, $base as xs:string?) as xs:string Resolves against a base directory
file:dir-separator() as xs:string Returns the OS directory separator (e.g., / or \)
file:line-separator() as xs:string Returns the OS line separator (e.g., \n or \r\n)
file:path-separator() as xs:string Returns the OS path separator (e.g., : or ;)
file:temp-dir() as xs:string Returns the system temporary directory path
file:base-dir() as xs:string? Returns the base directory of the current query
file:current-dir() as xs:string Returns the current working directory

🤖 Co-authored with Claude Code

@joewiz joewiz force-pushed the feature/expath-file-module branch from 03608f9 to 6f58982 Compare March 11, 2026 23:02
joewiz and others added 2 commits March 16, 2026 16:02
… 4.0

Replace the custom file module (http://exist-db.org/xquery/file) with a
spec-compliant EXPath File Module 4.0 (http://expath.org/ns/file)
implementation, improving interoperability with other XQuery processors.

The new module implements all 35+ functions from the EXPath File Module
4.0 specification including file properties, I/O, manipulation, path
utilities, and system properties. The old module's eXist-specific
file:sync function is relocated to util:file-sync.

Includes 64 XQuery integration tests covering all major functions
and error conditions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add missing function overloads:
- file:read-text/read-text-lines 3-arg $fallback form
- file:create-temp-dir/create-temp-file 2-arg form

Fix error codes per EXPath File 4.0 spec:
- file:copy/move raise file:no-dir when target parent missing
- file:create-dir raises file:exists when path component is a file
- file:read-binary rejects negative $length with file:out-of-range
- file:write-binary validates $offset against file size

Fix readBinary hang: replace BinaryValueFromInputStream (which uses
CachingFilterInputStream/FilterInputStreamCacheMonitor infrastructure
that prevents clean BrokerPool shutdown) with BinaryValueFromBinaryString.
Reads file into byte[], base64-encodes, wraps in lightweight value type
with no open handles and no-op close(). Tradeoff: ~2.4x memory for file
content, acceptable for typical file module use cases.

Resolve relative paths against XQuery static base URI when set as a
file: URI, falling back to JVM working directory.

Detect XML-illegal characters in read-text/read-text-lines: raise
file:io-error by default, or replace with U+FFFD when $fallback=true.

QT4 XQTS expath-file: 183/190 (96.3%), 0 hangs, 0 errors.
@joewiz joewiz force-pushed the feature/expath-file-module branch from b2ce4d5 to 31bd3e9 Compare March 16, 2026 20:05
joewiz and others added 2 commits March 20, 2026 03:20
Remove unused ENTRY_PARAM field, replace raw RuntimeException with
IllegalStateException, fix parameter reassignment in
noOtherNCNameAttribute(), and suppress NPathComplexity on list().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change the EXPath File module to coexist alongside eXist's original
file module rather than replacing it. This is a non-breaking, additive
change suitable for a feature release.

- Change default prefix from "file" to "exfile" in ExpathFileModule.java
  (namespace URI unchanged: http://expath.org/ns/file)
- Restore original file module (extensions/modules/file/) from develop
- Register BOTH modules in conf.xml:
  - http://exist-db.org/xquery/file (original, unchanged)
  - http://expath.org/ns/file (EXPath File 4.0, new)
- Keep file:sync in original FileModule; util:file-sync also available
- Update all EXPath test files to use exfile: prefix
- Restore exist-distribution pom.xml dependency on exist-file
- Restore image module's original dependency on exist-file

Both module test suites pass:
- EXPath File: 108 tests, 0 failures
- Original File: 63 tests, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joewiz joewiz changed the title Replace native file module with EXPath File Module 4.0 Add EXPath File Module 4.0 alongside native file module Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant