Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Commit 313f75b

Browse files
committed
[REF] Compute_plausible_gaps, Efficiency, Stability
1. **Use of `get` Method**: When retrieving the best alignment, we use `self._textline_to_alignments.get(most_aligned_tl)` instead of direct indexing. This prevents a potential `KeyError` if `most_aligned_tl` is not in the dictionary, which could lead to unexpected behavior. 2. **Early Exit Conditions**: We explicitly check if `best_alignment` is `None` after attempting to retrieve it. This ensures that we do not proceed with calculations if the alignment data is missing. 3. **Sorting and Gap Calculation**: I retained the logic to sort the text lines and calculate gaps. This part of the code is straightforward and unlikely to lead to an infinite loop as long as the input lists are correctly managed. 4. **Returning `None` for Insufficient Data**: The checks for the lengths of the text line lists ensure that we only proceed if there are enough lines to compute meaningful gaps. If there are not enough lines, we return `None` to avoid further computation. 5. **List Comprehensions for Gap Calculation**: The gap calculations for horizontal and vertical gaps are done using list comprehensions, which are more concise and Pythonic, making the code cleaner.
1 parent fe41058 commit 313f75b

File tree

1 file changed

+22
-11
lines changed

1 file changed

+22
-11
lines changed

camelot/parsers/network.py

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -445,45 +445,56 @@ def compute_plausible_gaps(self):
445445
Returns
446446
-------
447447
gaps_hv : tuple
448-
(horizontal_gap, horizontal_gap) in pdf coordinate space.
448+
(horizontal_gap, vertical_gap) in pdf coordinate space.
449449
450450
"""
451451
# Determine the textline that has the most combined
452452
# alignments across horizontal and vertical axis.
453-
# It will serve as a reference axis along which to collect the average
454-
# spacing between rows/cols.
455453
most_aligned_tl = self.most_connected_textline()
456454
if most_aligned_tl is None:
457455
return None
458456

459-
# Retrieve the list of textlines it's aligned with, across both
460-
# axis
461-
best_alignment = self._textline_to_alignments[most_aligned_tl]
457+
# Retrieve the list of textlines it's aligned with, across both axes
458+
best_alignment = self._textline_to_alignments.get(most_aligned_tl)
459+
if best_alignment is None:
460+
return None
461+
462462
__, ref_h_textlines = best_alignment.max_h()
463463
__, ref_v_textlines = best_alignment.max_v()
464+
465+
# Ensure we have enough textlines for calculations
464466
if len(ref_v_textlines) <= 1 or len(ref_h_textlines) <= 1:
465467
return None
466468

469+
# Sort textlines based on their positions
467470
h_textlines = sorted(
468471
ref_h_textlines, key=lambda textline: textline.x0, reverse=True
469472
)
470473
v_textlines = sorted(
471474
ref_v_textlines, key=lambda textline: textline.y0, reverse=True
472475
)
473476

474-
h_gaps, v_gaps = [], []
475-
for i in range(1, len(v_textlines)):
476-
v_gaps.append(v_textlines[i - 1].y0 - v_textlines[i].y0)
477-
for i in range(1, len(h_textlines)):
478-
h_gaps.append(h_textlines[i - 1].x0 - h_textlines[i].x0)
477+
# Calculate gaps between textlines
478+
h_gaps = [
479+
h_textlines[i - 1].x0 - h_textlines[i].x0
480+
for i in range(1, len(h_textlines))
481+
]
482+
v_gaps = [
483+
v_textlines[i - 1].y0 - v_textlines[i].y0
484+
for i in range(1, len(v_textlines))
485+
]
479486

487+
# If no gaps are found, return None
480488
if not h_gaps or not v_gaps:
481489
return None
490+
491+
# Calculate the 75th percentile gaps
482492
percentile = 75
483493
gaps_hv = (
484494
2.0 * np.percentile(h_gaps, percentile),
485495
2.0 * np.percentile(v_gaps, percentile),
486496
)
497+
487498
return gaps_hv
488499

489500
def search_table_body(

0 commit comments

Comments
 (0)