Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

Conversation

@bosd
Copy link
Owner

@bosd bosd commented Sep 21, 2024

No description provided.

Frh added 30 commits April 18, 2020 17:25
Drop EOL Python 2 support. Resolve unit test discrepancies.
Update unit tests to pass in Travis across all supported Py.
Linting.
Move common code to base class to reduce duplication
Stream plots display pdf background for better context
Refactor parsers by moving common code to the base class
Maintain Python 3.5 compatibility by removing f"{}"
Move common parse error stats computation to base parser
Move copy_spanning_text logic to the table
* plot info passed through debug_info
* display each text edge
* Display regions and areas rectangles
Accept cells if they're at least 50% within the table's bounds.
Frh added 29 commits June 11, 2020 17:20
Plot takes an optional axes parameter, allowing notebooks more
flexibility.
Header heuristic in hybrid won't include headers which span the
entire table.
Added unit test for issue camelot-dev#132

Fixes camelot-dev#132
f-strings fail unit tests in Python <3.7, removed them for .format.
Made download_url simulate Mozilla/5.0 to restore unit tests, since
server targetted was 403ing.
plot.text shows vertical text in red
_generate_columns_and_rows split between hybrid and stream
Plot vertical col anchors found by hybrid parser
Include vertical text in col/row generation
While searching for table body boundaries, exclude rows that include
cells crossing previously discovered rows.
No longer rely on the mode but on the parsing analysis during network
detection.
Added unit test for complex table with vertical header and mixed
horizontal / vertical text.
Enforce order of textline plotting for unit test consistency in 3.6.
Create wrapper around camelot plot that enforces backwards consistency
with older versions of matplotlib.
Create hybrid parser leverage both lattice and network techniques.
Simplify plotting of pdf in lattice case.
Rename "parser.table_bbox" into "parser.table_bbox_parses", since it
represents not a bbox but a dict of bbox to corresponding parsing data.

Still missing: more unit tests, plotting of steps.
Fix first split merge issue
Improve parser comparison notebook to flag identical parses, display
multiple tables correctly
Fix tolerance parameter inclusion for hybrid.
* If Travis uses pytest-cov >= 2.10, it also needs pytest >= 4.6
* Clean up the parser comparison notebook
* Address issue where hybrid didn't honor the columns parameter
* Fix dropping of empty rows/columns in hybrid
* Hybrid learns table y-dimensions from lattice
* Improve explanations of network, hybrid, and lattice parsers
* Remove dead code from parser comparison notebook
* Clean-up notebook variables to reduce size and make diffs cleaner
* Revert changes that were peripheral to the core changes
@bosd bosd marked this pull request as ready for review September 21, 2024 20:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant