Skip to content

docling_parse_v2 split/connect words #952

@InbarShapira

Description

@InbarShapira

Bug

There are cases that docling_parser_v2 spilt words to it characters or connect words

Example1:
Original text: products that were recently iroduced
markdown: products that were re c e n t l y i roduced

Example2:
Original text: Tables 2–5 show the results of partitioning the graphs in our test suite on
markdown: Tables 2-5 sho w theresultsfpartitioningegraphsinourtest suite on

Steps to reproduce

...

Docling version

Docling version: 2.21.0
Docling Core version: 2.18.0
Docling IBM Models version: 3.3.0
Docling Parse version: 3.3.0
Python: cpython-311 (3.11.4)
Platform: macOS-14.6.1-arm64-arm-64bit

Python version

Python 3.11.4

Metadata

Metadata

Labels

bugSomething isn't workingpdf parsingPDF issue related to docling-parse

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions