Skip to content

Commit 624ea5f

Browse files
committed
update and fix docs and readme
1 parent cbe7ae2 commit 624ea5f

File tree

3 files changed

+78
-4
lines changed

3 files changed

+78
-4
lines changed

README.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,35 @@ result = converter.convert(source)
8787
# the postprocessor modifies the result.document in place.
8888
ResultPostprocessor(result).process()
8989

90-
# enjoy the reordered document
90+
# enjoy the reordered document - for example convert it to markdown
9191
result.document.export_to_markdown()
92+
93+
# or use a chunker on it...
94+
```
95+
96+
or for the VLM-pipeline:
97+
98+
```python
99+
from docling.datamodel.base_models import InputFormat
100+
from docling.document_converter import DocumentConverter, PdfFormatOption
101+
from docling.pipeline.vlm_pipeline import VlmPipeline
102+
103+
source = "my_scanned.pdf" # document per local path or URL
104+
105+
converter = DocumentConverter(
106+
format_options={
107+
InputFormat.PDF: PdfFormatOption(
108+
pipeline_cls=VlmPipeline,
109+
),
110+
}
111+
)
112+
result = converter.convert(source=source)
113+
ResultPostprocessor(result).process()
114+
115+
# enjoy the reordered document - for example convert it to markdown
116+
result.document.export_to_markdown()
117+
118+
# or use a chunker on it...
92119
```
93120

94121
## Citation

docs/index.md

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,55 @@
11
# docling-hierarchical-pdf
22

3-
[![Release](https://img.shields.io/github/v/release/krrome/docling-hierarchical-pdf)](https://img.shields.io/github/v/release/krrome/docling-hierarchical-pdf)
4-
[![Build status](https://img.shields.io/github/actions/workflow/status/krrome/docling-hierarchical-pdf/main.yml?branch=main)](https://github.com/krrome/docling-hierarchical-pdf/actions/workflows/main.yml?query=branch%3Amain)
53
[![Commit activity](https://img.shields.io/github/commit-activity/m/krrome/docling-hierarchical-pdf)](https://img.shields.io/github/commit-activity/m/krrome/docling-hierarchical-pdf)
64
[![License](https://img.shields.io/github/license/krrome/docling-hierarchical-pdf)](https://img.shields.io/github/license/krrome/docling-hierarchical-pdf)
75

86
This package enables inference of header hierarchy in the docling PDF parsing pipeline.
7+
8+
The docs are still in the making, but as a user all you need is:
9+
10+
Install it:
11+
```bash
12+
pip install docling-hierarchical-pdf
13+
```
14+
15+
Use it:
16+
```python
17+
from docling.document_converter import DocumentConverter
18+
from hierarchical.postprocessor import ResultPostprocessor
19+
20+
source = "my_file.pdf" # document per local path or URL
21+
converter = DocumentConverter()
22+
result = converter.convert(source)
23+
# the postprocessor modifies the result.document in place.
24+
ResultPostprocessor(result).process()
25+
26+
# enjoy the reordered document - for example convert it to markdown
27+
result.document.export_to_markdown()
28+
29+
# or use a chunker on it...
30+
```
31+
32+
or for the VLM-pipeline
33+
34+
```python
35+
from docling.datamodel.base_models import InputFormat
36+
from docling.document_converter import DocumentConverter, PdfFormatOption
37+
from docling.pipeline.vlm_pipeline import VlmPipeline
38+
39+
source = "my_scanned.pdf" # document per local path or URL
40+
41+
converter = DocumentConverter(
42+
format_options={
43+
InputFormat.PDF: PdfFormatOption(
44+
pipeline_cls=VlmPipeline,
45+
),
46+
}
47+
)
48+
result = converter.convert(source=source)
49+
ResultPostprocessor(result).process()
50+
51+
# enjoy the reordered document - for example convert it to markdown
52+
result.document.export_to_markdown()
53+
54+
# or use a chunker on it...
55+
```

docs/modules.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
::: hierarchical.foo
1+
::: hierarchical

0 commit comments

Comments
 (0)