Skip to content

Certain text layers are not parsed by docling_parse #162

@HE-HIVE

Description

@HE-HIVE

The following PDF extracts no text despite there being a clear text layer, PyPdfium works fine for this:

https://www.has-sante.fr/upload/docs/application/pdf/ct031458.pdf

I have seen a few other examples where the text layer is garbled using this project as a backend, thought it may be helpful.

Thanks,

Herman

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions