You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the new model is highly capable of ocring tables and text accurately, but somehow it performs worse when it comes to detecting layout elements, for instance:
it detects Normal text paragraphs as tables if they are formatted and aligned nicely
many times it detects Normal text paragraphs ( with little padding from left) as a math display equation ..
sometimes it does not include the y-label description as part of the image, instead it puts it in a separated line or span
My suggestion:
Make an option where we can rely on the old pipeline for detecting the layout, the for each layout elements we use the new vlm model to OCR its content , with a restriction configuration which match the detected layout ( for instance if the pipeline detected a Normal text span , then when we send this To ocr and configure Table = false and math = false)
Also ..
Is there a way to patch process multiple images with the new Vlm model ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that the new model is highly capable of ocring tables and text accurately, but somehow it performs worse when it comes to detecting layout elements, for instance:
My suggestion:
Make an option where we can rely on the old pipeline for detecting the layout, the for each layout elements we use the new vlm model to OCR its content , with a restriction configuration which match the detected layout ( for instance if the pipeline detected a Normal text span , then when we send this To ocr and configure Table = false and math = false)
Also ..
Is there a way to patch process multiple images with the new Vlm model ?
Beta Was this translation helpful? Give feedback.
All reactions