-
Notifications
You must be signed in to change notification settings - Fork 10
Description
given this, a bit weird, FoLiA file
<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="bugxx" generator="libfolia-v1.11" version="2.5">
<metadata type="native">
<annotations>
<text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
<division-annotation/>
<paragraph-annotation/>
<sentence-annotation/>
<hyphenation-annotation/>
<string-annotation/>
</annotations>
</metadata>
<text xml:id="bug">
<div xml:id="bug.div">
<p xml:id="bug.div.p">
<s xml:id="bug.div.p.s.2">
<t>appel<t-hbr>-</t-hbr>taart</t>
<str xml:id="bug.div.p.s.2.str.1">
<t offset="0">appel</t>
</str>
<str xml:id="bug.div.p.s.2.str.2">
<t offset="5"><t-hbr>-</t-hbr></t>
</str>
<str xml:id="bug.div.p.s.2.str.3">
<t offset="5">taart</t>
</str>
</s>
</p>
</div>
</text>
</FoLiA>
This is accepted by folialint (latest GIT version), But rejected byfoliavalidator
The latter states:
TEXT VALIDATION ERROR: Text for String, ID bug.div.p.s.2.str.2, textclass current, has incorrect offset 5 or invalid reference: Reference (ID bug.div.p.s.2, class=current) found but no text match at specified offset (5)! Expected '', got 't', full text: 'appeltaart"
(also checked against older rules prior to FoLiA v2.4.1)
VALIDATION ERROR on full parse by library (stage 2/3), in tests/bug52-3.xml
UnresolvableTextContent: Reference (ID bug.div.p.s.2, class=current) found but no text match at specified offset (5)! Expected '', got 't', full text: 'appeltaart"
The problem is with the offset of the <t-hbr> element in the second <str>
IMHO this should be 5, as folialint accepts. And, while it has a size off 0, the next <str> ALSO has that same offset, 5.
This is a BUG
Both programs don't really handle this very well though. As can be shown by replacing the offset by a an out-of-band- value,
like -1, 10 or 2894234
In that case both programs will validate the FoLiA
SOLUTION:
I suppose that FoliA elements with the IMPLICITSPACE property should be defined to add 0 to the offset,
AND: when an offset attribute is added, it should have a meaningful, correct value.
Which might prove to be difficult, as the offset should be equal to that of the NEXT non-TextMarkup element, and
there is no obligation to have an offset attribute there. (or even that that element exists)