We should have heuristics to check for
- polygon containment (overlapping regions, word outside line etc.)
- artifacts from annotation like point or line-like regions
- lines with (way) too much whitespace (bad cropping, or bad segmentation)
- probably even: missing
@orientation
Originally posted by @kba in OCR-D/assets#28 (comment)