Skip to content

parseXml unexpectedly creates xnText nodes after encountering an xml comment #25462

@doublespiral

Description

@doublespiral

Nim Version

Nim Compiler Version 2.2.4 [Linux: amd64]
Compiled at 2025-04-22
Copyright (c) 2006-2025 by Andreas Rumpf

git hash: f7145dd26efeeeb6eeae6fff649db244d81b212d
active boot switches: -d:release

Description

Example that reproduces the issue:

import std/[xmltree, xmlparser, parsexml]

const xml_text = """
<?xml version="1.0" encoding="UTF-8"?>
<foo>
    <bar>text bar</bar>
    <!-- xml comment -->
    <baz>text baz</baz>
</foo>
"""

when isMainModule:
    let root_node = xml_text.parseXml()

    for child in root_node:
        echo "child.kind() = ", child.kind()

Current Output

child.kind() = xnElement
child.kind() = xnComment
child.kind() = xnText
child.kind() = xnElement

Expected Output

child.kind() = xnElement
child.kind() = xnComment
child.kind() = xnElement

Known Workarounds

Removing the characters between the end of the comment and the start of the new element

Additional Information

The proc causing this issue is rawGetTok in lib/pure/parsexml.nim. It assumes that when we're not creating a new element we're creating a text node, which in the case of the ending of a comment, is not expected

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions