I would like to use your tool to investigate data noise in https://huggingface.co/datasets/aharley/rvl_cdip and https://ds4sd.github.io/icdar23-doclaynet/
It is known in the literature already that there is plenty of noise in RVL_CDIP, yet your tool could provide more quantitative insight.