Skip to content
Discussion options

You must be logged in to vote

You can't directly extend PdfPipelineOptions to use VLM for OCR on detected text regions, but you can achieve this by creating a custom OCR engine that runs VLM inference on cropped regions, then registering it with Docling's OCR factory and selecting it via ocr_options. The standard pipeline is designed to let you swap in custom OCR logic this way (reference, reference).

Here's a high-level code pattern:

  1. Subclass BaseOcrModel (see EasyOcrModel for reference):
    For each detected text region (bounding box), crop the region from the page image, call your VLM model on the crop, and assemble the results as TextCell objects.
from docling.models.base_ocr_model import BaseOcrModel
from docling…

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@dosubot
Comment options

Answer selected by AaronJan
@noamkap
Comment options

@dosubot
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants