Skip to content

Namespace prefix inheritance produces inefficient output #72

@danieldulaney

Description

@danieldulaney

It looks like prefix handling is slightly off when children use a prefix mapping inherited from an ancestor. For example, take the XML document:

<?xml version='1.0'?>
<root xmlns:x='urn:very-long-urn'>
  <x:child/>
  <x:child/>
  <x:child/>
</root>

If you parse and output the document with default writer settings, you end up with something like this:

<?xml version='1.0'?>
<root>
  <x:child xmlns:x='urn:very-long-urn'/>
  <x:child xmlns:x='urn:very-long-urn'/>
  <x:child xmlns:x='urn:very-long-urn'/>
</root>

This output seems sub-optimal. First, it's longer than the input. Second, it is more difficult to edit later because the prefixes all refer to different URN mappings. It would be better to maintain the single URN mapping wherever it is defined.

Additionally, it is difficult to see what is happening because Element doesn't have any way to see existing prefix-to-namespace mappings.

I propose two separate changes:

  • Add a method on Element to see what prefixes have already been mapped there
  • Always put the prefix definition on the element it was originally defined, even if it is only used in ancestor nodes

If there's buy-in, I'm happy to work on PRs that address both issues.

Example code (fails assertion):

let xml = "<?xml version='1.0'?><root xmlns:x='urn:very-long-urn'><x:child/><x:child/><x:child/></root>";

let package = sxd_document::parser::parse(xml).unwrap();
let doc = package.as_document();

let mut output = Vec::new();
sxd_document::writer::Writer::new().format_document(&doc, &mut output).unwrap();
let output_str = String::from_utf8(output).unwrap();

assert_eq!(output_str, xml);

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions