Skip to content
Discussion options

You must be logged in to vote

Hi @MichaelSzczepaniak! I'm Dosu and I’m helping the docling team.

This usually happens because Docling's default PDF backend (DoclingParseV4DocumentBackend) sometimes fails silently on certain PDFs, especially those from financial institutions. Since your PDF has selectable text, you can often fix this by explicitly switching to the PyPdfiumDocumentBackend, which is more robust for direct text extraction.

Here's how you can do it:

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by MichaelSzczepaniak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant