Skip to content

XMLParser moves #text into the unpairedTag nodeΒ #785

@levensta

Description

@levensta
  • Are you running the latest version?
  • Have you included sample input, output, error, and expected output?
  • Have you checked if you are using correct configuration?
  • Did you try online tool?
  • Have you checked the docs for helpful APIs and examples?

Description

If the next sibling node after a tag declared as unpaired (by unpairedTags option) is simple #text, it will be moved as a child during parsing. When passing such an object to XMLBuilder to restore the original XHTML, such text will be cut out.

<p>hello<br>world</p>

This does NOT reproduce if the node following unpaired is followed by a tag. For example:

<p>hello<br><b>world</b></p>

It also does NOT reproduce if the unpaired node contains a closing slash:

<p>hello<br/>world</p>

Code

import { XMLParser, XMLBuilder } from 'fast-xml-parser'

const html = `
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Fast XML Parser</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <p>
            hello
            <br>
            world
        </p>
    </body>
</html>`

const parsingOptions = {
    ignoreAttributes: false,
    preserveOrder: true,
    unpairedTags: ["hr", "br", "link", "meta"],
};
const parser = new XMLParser(parsingOptions);
let result = parser.parse(html);
console.log(JSON.stringify(result, null, 2))

const builderOptions = {
    ignoreAttributes: false,
    preserveOrder: true,
    suppressEmptyNode: true,
    unpairedTags: ["hr", "br", "link", "meta"],
}
const builder = new XMLBuilder(builderOptions);
const output = builder.build(result);
console.log(output)

Output

jObj
[
  {
    "html": [
      {
        "head": [
          {
            "title": [
              {
                "#text": "Fast XML Parser"
              }
            ]
          },
          {
            "meta": [],
            ":@": {
              "@_charset": "UTF-8"
            }
          }
        ]
      },
      {
        "body": [
          {
            "p": [
              {
                "#text": "hello"
              },
              {
                "br": [
                  {
                    "#text": "world"
                  }
                ]
              }
            ]
          }
        ]
      }
    ],
    ":@": {
      "@_lang": "en"
    }
  }
]

xmlOutput

<html lang="en"><head><title>Fast XML Parser</title><meta charset="UTF-8"></head><body><p>hello<br></p></body></html>

expected data

jObj
[
  {
    "html": [
      {
        "head": [
          {
            "title": [
              {
                "#text": "Fast XML Parser"
              }
            ]
          },
          {
            "meta": [],
            ":@": {
              "@_charset": "UTF-8"
            }
          }
        ]
      },
      {
        "body": [
          {
            "p": [
              {
                "#text": "hello"
              },
              {
                "br": []
              },
              {
                "#text": "world"
              }
            ]
          }
        ]
      }
    ],
    ":@": {
      "@_lang": "en"
    }
  }
]

xmlOutput

<html lang="en"><head><title>Fast XML Parser</title><meta charset="UTF-8"></head><body><p>hello<br>world</p></body></html>

Would you like to work on this issue?

  • Yes
  • No

Bookmark this repository for further updates. Visit SoloThought to know about recent features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions