Skip to content

release: prepare v0.7.0 - Split Table Merging Feature enhancement#72

Merged
AdemBoukhris457 merged 1 commit intomainfrom
release/v0.7.0
Nov 1, 2025
Merged

release: prepare v0.7.0 - Split Table Merging Feature enhancement#72
AdemBoukhris457 merged 1 commit intomainfrom
release/v0.7.0

Conversation

@AdemBoukhris457
Copy link
Owner

Summary

This PR prepares the release of v0.7.0, which introduces automatic split table detection and merging capabilities. The release adds a feature that automatically detects and merges tables split across page boundaries, along with documentation enhancements and visualizations.

Release Highlights

Split table merging

  • Two-phase detection (proximity + structural validation)
  • LSD-based column detection for accurate table matching
  • Configurable thresholds and confidence scoring
  • Fallback mechanisms for borderless tables
  • Automatic composite image creation

Documentation

  • Mermaid flowcharts replacing ASCII art
  • Step-by-step narrative explanations
  • Complete API reference
  • Troubleshooting guides

Changes Included in v0.7.0

Code changes

  • Added SplitTableDetector class with detection logic
  • Extended StructuredPDFParser with split table merging parameters:
    • merge_split_tables (default: False)
    • bottom_threshold_ratio, top_threshold_ratio
    • max_gap_ratio, column_alignment_tolerance
    • min_merge_confidence
  • Implemented two-phase detection algorithm:
    • Phase 1: Proximity checks (position, overlap, gap, width)
    • Phase 2: Structural validation (column detection, alignment, confidence)
  • Added fallback mechanisms for edge cases
  • Integrated merge processing into parsing pipeline

Documentation

  • Added split table merging guide (split-table-merging.md)
  • Converted 8 ASCII diagrams to Mermaid flowcharts
  • Added "Process Overview" narrative section
  • Updated API reference with new parameters
  • Fixed documentation links and navigation

Recent Merge Requests Included

This release incorporates the following merge requests:

Breaking Changes

None. This release is backward compatible. The split table merging feature is opt-in (default merge_split_tables=False) and does not affect existing functionality.

Release Notes

v0.7.0 adds automatic split table detection and merging. The parser can detect tables split across pages and merge them into single images for improved extraction.

The feature uses a two-phase approach: proximity detection filters candidate pairs, and structural validation confirms matches using computer vision. It includes configurable thresholds, confidence scoring, and fallbacks for borderless tables.

Documentation includes interactive Mermaid diagrams and a step-by-step narrative explaining the process. The feature is disabled by default and can be enabled via merge_split_tables=True.


Version bump: 0.6.20.7.0

Add split table merging feature for automatic detection and merging of tables spanning multiple pages.
@AdemBoukhris457 AdemBoukhris457 self-assigned this Nov 1, 2025
@AdemBoukhris457 AdemBoukhris457 added documentation Improvements or additions to documentation enhancement New feature or request release Publishing a new release labels Nov 1, 2025
@AdemBoukhris457 AdemBoukhris457 merged commit 7a7f8f9 into main Nov 1, 2025
@AdemBoukhris457 AdemBoukhris457 deleted the release/v0.7.0 branch November 1, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request release Publishing a new release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant