Skip to content

Commit 1fb1039

Browse files
committed
Update API, naming, and tests. Split data models to be independent.
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
1 parent c14ec54 commit 1fb1039

File tree

45 files changed

+1793636
-1689377
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1793636
-1689377
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ pip install docling-parse
6565
Convert a PDF (look in the [visualize.py](docling_parse/visualize.py) for a more detailed information)
6666

6767
```python
68-
from docling_parse.document import SegmentedPdfPageLabel
68+
from docling_parse.document import TextCellUnit
6969
from docling_parse.pdf_parser import DoclingPdfParser, PdfDocument
7070

7171
parser = DoclingPdfParser()
@@ -78,11 +78,11 @@ pdf_doc: PdfDocument = parser.load(
7878
for page_no, pred_page in pdf_doc.iterate_pages():
7979

8080
# iterate over the word-cells
81-
for word in pred_page.yield_cells(label=SegmentedPdfPageLabel.WORD):
82-
print(word.rect, ": ", word.text)
81+
for word in pred_page.iterate_cells(unit_type=TextCellUnit.WORD):
82+
print(word.rect, ": ", word.text)
8383

84-
# create a PIL image with the char cells
85-
img = pred_page.render(label=SegmentedPdfPageLabel.CHAR)
84+
# create a PIL image with the char cells
85+
img = pred_page.render_as_image(label=TextCellUnit.CHAR)
8686
img.show()
8787
```
8888

0 commit comments

Comments
 (0)