Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Commit a512485

Browse files
ollynowellbosd
authored andcommitted
Sort the PDFMiner text objects along the x axis before applying the grouping algorithm, to avoid missing columns
1 parent 75d94c4 commit a512485

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

camelot/parsers/stream.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ def _group_rows(text, row_tol=2):
129129
rows = []
130130
temp = []
131131

132+
text.sort(key=lambda x: (-x.y0, x.x0))
132133
for t in text:
133134
# is checking for upright necessary?
134135
# if t.get_text().strip() and all([obj.upright for obj in t._objs if

0 commit comments

Comments
 (0)