-
Notifications
You must be signed in to change notification settings - Fork 663
feat(xml/unstable): add XML parsing and serialization module #6942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tomas-zijdemans
wants to merge
3
commits into
denoland:main
Choose a base branch
from
tomas-zijdemans:xml
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+18,733
−2
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
9b1fe20
feat(xml): add XML module with streaming parser, DOM-style parser, an…
tomas-zijdemans 81445b5
perf(xml): native TransformStream for 20% faster streaming
tomas-zijdemans a5ed5bf
refactor(xml): remove deprecated async generator APIs, sync all tests
tomas-zijdemans File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,4 +76,5 @@ jobs: | |
| ulid(/unstable)? | ||
| uuid(/unstable)? | ||
| webgpu(/unstable)? | ||
| xml(/unstable)? | ||
| yaml(/unstable)? | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -48,6 +48,7 @@ | |
| "./ulid", | ||
| "./uuid", | ||
| "./webgpu", | ||
| "./xml", | ||
| "./yaml" | ||
| ] | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,6 @@ | |
| "npm:/typescript": "npm:[email protected]", | ||
| "automation/": "https://raw.githubusercontent.com/denoland/automation/0.10.0/", | ||
| "graphviz": "npm:node-graphviz@^0.1.1", | ||
|
|
||
| "@std/assert": "jsr:@std/assert@^1.0.16", | ||
| "@std/async": "jsr:@std/async@^1.0.16", | ||
| "@std/bytes": "jsr:@std/bytes@^1.0.6", | ||
|
|
@@ -46,6 +45,7 @@ | |
| "@std/ulid": "jsr:@std/ulid@^1.0.0", | ||
| "@std/uuid": "jsr:@std/uuid@^1.1.0", | ||
| "@std/webgpu": "jsr:@std/webgpu@^0.224.9", | ||
| "@std/xml": "jsr:@std/xml@^0.0.1", | ||
| "@std/yaml": "jsr:@std/yaml@^1.0.10" | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| // Copyright 2018-2026 the Deno authors. MIT license. | ||
| // This module is browser compatible. | ||
|
|
||
| /** | ||
| * Internal shared utilities for the XML module. | ||
| * | ||
| * @module | ||
| */ | ||
|
|
||
| import type { XmlName } from "./types.ts"; | ||
|
|
||
| /** | ||
| * Line ending normalization pattern per XML 1.0 §2.11. | ||
| * Converts \r\n and standalone \r to \n. | ||
| */ | ||
| export const LINE_ENDING_RE = /\r\n?/g; | ||
|
|
||
| /** | ||
| * Whitespace-only test per XML 1.0 §2.3. | ||
| * Uses explicit [ \t\r\n] instead of \s to match XML spec exactly: | ||
| * S ::= (#x20 | #x9 | #xD | #xA)+ | ||
| */ | ||
| export const WHITESPACE_ONLY_RE = /^[ \t\r\n]*$/; | ||
|
|
||
| /** | ||
| * XML declaration version attribute pattern. | ||
| * Matches both single and double quoted values. | ||
| */ | ||
| export const VERSION_RE = /version\s*=\s*(?:"([^"]+)"|'([^']+)')/; | ||
|
|
||
| /** | ||
| * XML declaration encoding attribute pattern. | ||
| * Matches both single and double quoted values. | ||
| */ | ||
| export const ENCODING_RE = /encoding\s*=\s*(?:"([^"]+)"|'([^']+)')/; | ||
|
|
||
| /** | ||
| * XML declaration standalone attribute pattern. | ||
| * Matches both single and double quoted values, restricted to "yes" or "no". | ||
| */ | ||
| export const STANDALONE_RE = /standalone\s*=\s*(?:"(yes|no)"|'(yes|no)')/; | ||
|
|
||
| /** | ||
| * Parses a qualified XML name into its prefix and local parts. | ||
| * | ||
| * @example Usage | ||
| * ```ts | ||
| * import { parseName } from "./_common.ts"; | ||
| * | ||
| * parseName("ns:element"); // { prefix: "ns", local: "element" } | ||
| * parseName("element"); // { local: "element" } | ||
| * ``` | ||
| * | ||
| * @param name The raw name string (e.g., "ns:element" or "element") | ||
| * @returns An XmlName object with local and optional prefix | ||
| */ | ||
| export function parseName(name: string): XmlName { | ||
| const colonIndex = name.indexOf(":"); | ||
| if (colonIndex === -1) { | ||
| return { local: name }; | ||
| } | ||
| return { | ||
| prefix: name.slice(0, colonIndex), | ||
| local: name.slice(colonIndex + 1), | ||
| }; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,178 @@ | ||
| // Copyright 2018-2026 the Deno authors. MIT license. | ||
| // This module is browser compatible. | ||
|
|
||
| /** | ||
| * Internal module for XML entity encoding and decoding. | ||
| * | ||
| * @module | ||
| */ | ||
|
|
||
| /** | ||
| * The five predefined XML entities per XML 1.0 §4.6. | ||
| * Using const assertion for precise typing. | ||
| */ | ||
| const NAMED_ENTITIES = { | ||
| lt: "<", | ||
| gt: ">", | ||
| amp: "&", | ||
| apos: "'", | ||
| quot: '"', | ||
| } as const; | ||
|
|
||
| /** | ||
| * Reverse mapping for encoding special characters. | ||
| */ | ||
| const CHAR_TO_ENTITY = { | ||
| "<": "<", | ||
| ">": ">", | ||
| "&": "&", | ||
| "'": "'", | ||
| '"': """, | ||
| } as const; | ||
|
|
||
| /** | ||
| * Extended mapping for attribute value encoding (includes whitespace). | ||
| */ | ||
| const ATTR_CHAR_MAP: Record<string, string> = { | ||
| "<": "<", | ||
| ">": ">", | ||
| "&": "&", | ||
| "'": "'", | ||
| '"': """, | ||
| "\t": "	", | ||
| "\n": " ", | ||
| "\r": " ", | ||
| }; | ||
|
|
||
| // Hoisted regex patterns for performance | ||
| const ENTITY_RE = /&([a-zA-Z]+|#[0-9]+|#x[0-9a-fA-F]+);/g; | ||
| const SPECIAL_CHARS_RE = /[<>&'"]/g; | ||
| const ATTR_ENCODE_RE = /[<>&'"\t\n\r]/g; | ||
|
|
||
| /** | ||
| * Pattern to detect bare `&` not followed by a valid reference. | ||
| * Valid references are: &name; or &#digits; or &#xhexdigits; | ||
| */ | ||
| const BARE_AMPERSAND_RE = /&(?![a-zA-Z][a-zA-Z0-9]*;|#[0-9]+;|#x[0-9a-fA-F]+;)/; | ||
|
|
||
| /** | ||
| * Checks if a code point is a valid XML 1.0 Char per §2.2. | ||
| * | ||
| * Per the specification: | ||
| * Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] | ||
| * | ||
| * This excludes: | ||
| * - NULL (#x0) | ||
| * - Control characters #x1-#x8, #xB-#xC, #xE-#x1F | ||
| * - Surrogate pairs #xD800-#xDFFF (handled separately) | ||
| * - Non-characters #xFFFE-#xFFFF | ||
| * | ||
| * @see {@link https://www.w3.org/TR/xml/#charsets | XML 1.0 §2.2 Characters} | ||
| */ | ||
| function isValidXmlChar(codePoint: number): boolean { | ||
| return ( | ||
| codePoint === 0x9 || | ||
| codePoint === 0xA || | ||
| codePoint === 0xD || | ||
| (codePoint >= 0x20 && codePoint <= 0xD7FF) || | ||
| (codePoint >= 0xE000 && codePoint <= 0xFFFD) || | ||
| (codePoint >= 0x10000 && codePoint <= 0x10FFFF) | ||
| ); | ||
| } | ||
|
|
||
| /** | ||
| * Options for entity decoding. | ||
| */ | ||
| export interface DecodeEntityOptions { | ||
| /** | ||
| * If true, throws an error on invalid bare `&` characters. | ||
| * Per XML 1.0 §3.1, `&` must be escaped as `&` unless it starts | ||
| * a valid entity or character reference. | ||
| * | ||
| * @default false | ||
| */ | ||
| readonly strict?: boolean; | ||
| } | ||
|
|
||
| /** | ||
| * Decodes XML entities in a string. | ||
| * | ||
| * Handles the five predefined entities (§4.6) and numeric character | ||
| * references (§4.1) per the XML 1.0 specification. | ||
| * | ||
| * @param text The text containing XML entities to decode. | ||
| * @param options Decoding options. | ||
| * @returns The text with entities decoded. | ||
| */ | ||
| export function decodeEntities( | ||
| text: string, | ||
| options?: DecodeEntityOptions, | ||
| ): string { | ||
| // Fast path: no ampersand means no entities to decode | ||
| if (!text.includes("&")) return text; | ||
|
|
||
| if (options?.strict) { | ||
| const match = BARE_AMPERSAND_RE.exec(text); | ||
| if (match) { | ||
| throw new Error( | ||
| `Invalid bare '&' at position ${match.index}: ` + | ||
| `entity references must be &name; or &#num; or &#xHex;`, | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| return text.replace(ENTITY_RE, (match, entity: string) => { | ||
| if (entity.startsWith("#x")) { | ||
| // Hexadecimal character reference | ||
| const codePoint = parseInt(entity.slice(2), 16); | ||
| // Invalid per XML 1.0 §4.1 WFC: Legal Character - must match Char production | ||
| if (!isValidXmlChar(codePoint)) { | ||
| return match; | ||
| } | ||
| return String.fromCodePoint(codePoint); | ||
| } | ||
| if (entity.startsWith("#")) { | ||
| // Decimal character reference | ||
| const codePoint = parseInt(entity.slice(1), 10); | ||
| // Invalid per XML 1.0 §4.1 WFC: Legal Character - must match Char production | ||
| if (!isValidXmlChar(codePoint)) { | ||
| return match; | ||
| } | ||
| return String.fromCodePoint(codePoint); | ||
| } | ||
| // Named entity | ||
| if (entity in NAMED_ENTITIES) { | ||
| return NAMED_ENTITIES[entity as keyof typeof NAMED_ENTITIES]; | ||
| } | ||
| // Unknown entity - return as-is | ||
| return match; | ||
| }); | ||
| } | ||
|
|
||
| /** | ||
| * Encodes special characters as XML entities. | ||
| * | ||
| * @param text The text to encode. | ||
| * @returns The text with special characters encoded as entities. | ||
| */ | ||
| export function encodeEntities(text: string): string { | ||
| // Fast path: no special characters means nothing to encode | ||
| if (!/[<>&'"]/.test(text)) return text; | ||
| return text.replace( | ||
| SPECIAL_CHARS_RE, | ||
| (char) => CHAR_TO_ENTITY[char as keyof typeof CHAR_TO_ENTITY], | ||
| ); | ||
| } | ||
|
|
||
| /** | ||
| * Encodes special characters for use in XML attribute values. | ||
| * Encodes whitespace characters that would be normalized per XML 1.0 §3.3.3. | ||
| * | ||
| * @param value The attribute value to encode. | ||
| * @returns The encoded attribute value. | ||
| */ | ||
| export function encodeAttributeValue(value: string): string { | ||
| // Fast path: no special characters means nothing to encode | ||
| if (!/[<>&'"\t\n\r]/.test(value)) return value; | ||
| return value.replace(ATTR_ENCODE_RE, (c) => ATTR_CHAR_MAP[c]!); | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have no idea why this formatting is happening 😅