-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
This came up after issue #45
when resolving a HEMP, FoLiA-correct just adds the resolved text to one of the string/word nodes.
I assume using a real Correction would be better.
for example:
<p xml:id="mwsel.p.1">
<t class="OCR">•c c•</t>
<str xml:id="mwsel.p.1.str.1">
<t class="OCR">•c</t>
</str>
<str xml:id="mwsel.p.1.str.2">
<t class="OCR">c•</t>
</str>
</p>assuming •c c• is in the PUNCT file as •c c• cc this HEMP is resolved as:
<p xml:id="mwsel.p.1">
<t>cc</t>
<t class="OCR">•c c•</t>
<str xml:id="mwsel.p.1.str.1">
<t class="OCR">•c</t>
</str>
<str xml:id="mwsel.p.1.str.2">
<t offset="0">cc</t>
<t class="OCR">c•</t>
</str>
</p>IMHO a much better solution would be:
<p xml:id="mwsel.p.1">
<t>cc</t>
<t class="OCR">•c c•</t>
<correction xml:id="mwsel.p.1.correction.1">
<new>
<str xml:id="mwsel.p.1.str.edit.1">
<t >cc</t>
</str>
</new>
<original>
<str xml:id="mwsel.p.1.str.1">
<t class="OCR">•c</t>
</str>
<str xml:id="mwsel.p.1.str.2">
<t class="OCR">c•</t>
</str>
</original>
</correction>
</p>interesting point: HEMP resolution is done before other corrections. I assume that a real correction using the cc will not be performed.