Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Commit ad1babd

Browse files
committed
[REF] Compute_plausible_gaps, Efficiency, Stability
1. **Sorting without Reverse**: When sorting the textlines, we sort them in ascending order directly. This avoids the need to reverse the sorted list later, which can save some computational overhead. 2. **Array Creation for Gaps**: Instead of creating lists and then converting them, we directly create `numpy` arrays to store gaps. This allows us to utilize `numpy`'s efficient operations for subsequent calculations. 3. **Early Exits**: The checks for the lengths of `ref_h_textlines` and `ref_v_textlines` provide early exits if not enough textlines are available, preventing unnecessary calculations. 4. **Percentile Calculation**: The percentile calculation remains unchanged, but we ensure that we are working with `numpy` arrays for performance.
1 parent 313f75b commit ad1babd

File tree

1 file changed

+19
-20
lines changed

1 file changed

+19
-20
lines changed

camelot/parsers/network.py

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -446,7 +446,6 @@ def compute_plausible_gaps(self):
446446
-------
447447
gaps_hv : tuple
448448
(horizontal_gap, vertical_gap) in pdf coordinate space.
449-
450449
"""
451450
# Determine the textline that has the most combined
452451
# alignments across horizontal and vertical axis.
@@ -459,6 +458,7 @@ def compute_plausible_gaps(self):
459458
if best_alignment is None:
460459
return None
461460

461+
# Extract the reference textlines
462462
__, ref_h_textlines = best_alignment.max_h()
463463
__, ref_v_textlines = best_alignment.max_v()
464464

@@ -467,32 +467,31 @@ def compute_plausible_gaps(self):
467467
return None
468468

469469
# Sort textlines based on their positions
470-
h_textlines = sorted(
471-
ref_h_textlines, key=lambda textline: textline.x0, reverse=True
472-
)
473-
v_textlines = sorted(
474-
ref_v_textlines, key=lambda textline: textline.y0, reverse=True
475-
)
470+
h_textlines = sorted(ref_h_textlines, key=lambda textline: textline.x0)
471+
v_textlines = sorted(ref_v_textlines, key=lambda textline: textline.y0)
476472

477473
# Calculate gaps between textlines
478-
h_gaps = [
479-
h_textlines[i - 1].x0 - h_textlines[i].x0
480-
for i in range(1, len(h_textlines))
481-
]
482-
v_gaps = [
483-
v_textlines[i - 1].y0 - v_textlines[i].y0
484-
for i in range(1, len(v_textlines))
485-
]
474+
h_gaps = np.array(
475+
[
476+
h_textlines[i].x0 - h_textlines[i - 1].x0
477+
for i in range(1, len(h_textlines))
478+
]
479+
)
480+
v_gaps = np.array(
481+
[
482+
v_textlines[i].y0 - v_textlines[i - 1].y0
483+
for i in range(1, len(v_textlines))
484+
]
485+
)
486486

487487
# If no gaps are found, return None
488-
if not h_gaps or not v_gaps:
488+
if h_gaps.size == 0 or v_gaps.size == 0:
489489
return None
490490

491-
# Calculate the 75th percentile gaps
492-
percentile = 75
491+
# Calculate the 75th percentile gaps using numpy for efficiency
493492
gaps_hv = (
494-
2.0 * np.percentile(h_gaps, percentile),
495-
2.0 * np.percentile(v_gaps, percentile),
493+
2.0 * np.percentile(h_gaps, 75),
494+
2.0 * np.percentile(v_gaps, 75),
496495
)
497496

498497
return gaps_hv

0 commit comments

Comments
 (0)