You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #6 from krrome/bug-support-DocumentStream-as-input
implement possibility to pass source to ResultPostprocessor for processing with pymupdf + add error handling so that ResultPostprocessor falls back to style based inference in case pymupdf can't read the file.
### Working with DocumentStream sources / PDFFileNotFoundException:
128
+
129
+
If you run into the `PDFFileNotFoundException` then your `source` attribute to `DocumentConverter().convert(source=source)` has either been of type `str` or of type `DocumentStream` so there is the Docling conversion result unfortunately does *not* hold a valid reference to the source file anymore. Hence the Postprocessor needs your help - if `source` was a string then you can add the `source=source` when instantiating `ResultPostprocessor` - full example:
130
+
131
+
```python
132
+
from docling.document_converter import DocumentConverter
133
+
from hierarchical.postprocessor import ResultPostprocessor
134
+
135
+
source ="my_file.pdf"# document per local path or URL
136
+
converter = DocumentConverter()
137
+
result = converter.convert(source)
138
+
# the postprocessor modifies the result.document in place.
If you have used a `DocumentStream` object as source you are unfortunately in the situation that you will have to pass a valid Path to the PDF as a `source` argument to `ResultPostprocessor` or a new, open BytesIO stream or `DocumentStream` object as a `source` argument to `ResultPostprocessor`. The reason is that docling *closes* the source stream when it is finished - so no more reading from that stream is possible.
144
+
145
+
### Exception handling for ToC extraction from metadata:
146
+
147
+
You want to handle exceptions regarding File-IO / Streams yourself - great, just set `raise_on_error` to `True` when instantiating `ResultPostprocessor`.
148
+
149
+
125
150
## Citation
126
151
127
152
If you use this software for your project please cite Docling as well as the following:
0 commit comments