Skip to content

QNames and Whitespace #5

@JonathanRowell

Description

@JonathanRowell

Try parsing the following XML

<marc:record xmlns:marc="http://www.loc.gov/MARC21/slim">
marc:leader00000cam a2200000 4500</marc:leader>
<marc:controlfield tag="001">25</marc:controlfield>
<marc:controlfield tag="005">20150304</marc:controlfield>
<marc:controlfield tag="008">950620p19821982||||||||||||||||||||nor|||</marc:controlfield>
<marc:datafield tag="015" ind1="" ind2="">
<marc:subfield code="a">82,A49,0102</marc:subfield>
</marc:datafield>
<marc:datafield tag="020" ind1="" ind2="">
<marc:subfield code="9">3-7678-0565-0</marc:subfield>
<marc:subfield code="c">Pp. : DM 9.80</marc:subfield>
</marc:datafield>
</marc:record>

  1. There is a CR/LF sequence after the processing instruction that turns the entire XML into a text node.

  2. CR/LF seems to make the parser stop parsing - so you have to globally remove them

  3. QNames are not recognised (marc:record is a QName) so you have to remove the namespace prefix.

Then it works, and it wasn't too slow. But how on earth can one handle the result? Make one small change and you get the same sort of tree without any indication that something went wrong.

And there ought to be an option to ignore whitespace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions