fix: Fix pdfminer error when using process_data_with_model (#178)

awalker4 · web-flow · commit ae73cf8c8d2d · 2023-08-16T16:13:15.000-05:00
When a pdf page doesn't have much data, it may get buffered in the write to a tempfile. If this happens, we'll hit an error reading the file back. This fixes the error by flushing the temp buffer.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,7 @@
+## 0.5.12
+
+* Fix a pdfminer error when using `process_data_with_model`
+
 ## 0.5.11
 
 * Add warning when chipper is used with < 300 DPI
diff --git a/unstructured_inference/__version__.py b/unstructured_inference/__version__.py
@@ -1 +1 @@
-__version__ = "0.5.11"  # pragma: no cover
+__version__ = "0.5.12"  # pragma: no cover
diff --git a/unstructured_inference/inference/layout.py b/unstructured_inference/inference/layout.py
@@ -373,6 +373,7 @@ def process_data_with_model(
     DocumentLayout by using a model identified by model_name."""
     with tempfile.NamedTemporaryFile() as tmp_file:
         tmp_file.write(data.read())
+        tmp_file.flush()  # Make sure the file is written out
         layout = process_file_with_model(
             tmp_file.name,
             model_name,

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.5.11" # pragma: no cover`
	`1`	`+__version__ = "0.5.12" # pragma: no cover`