Using Pipeline with the new VLM model #2709

zakir0101 · 2025-06-17T19:25:46Z

zakir0101
Jun 17, 2025

I noticed that the new model is highly capable of ocring tables and text accurately, but somehow it performs worse when it comes to detecting layout elements, for instance:

it detects Normal text paragraphs as tables if they are formatted and aligned nicely
many times it detects Normal text paragraphs ( with little padding from left) as a math display equation ..
sometimes it does not include the y-label description as part of the image, instead it puts it in a separated line or span

My suggestion:
Make an option where we can rely on the old pipeline for detecting the layout, the for each layout elements we use the new vlm model to OCR its content , with a restriction configuration which match the detected layout ( for instance if the pipeline detected a Normal text span , then when we send this To ocr and configure Table = false and math = false)

Also ..
Is there a way to patch process multiple images with the new Vlm model ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Pipeline with the new VLM model #2709

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Using Pipeline with the new VLM model #2709

Uh oh!

zakir0101 Jun 17, 2025

Replies: 0 comments

zakir0101
Jun 17, 2025