Releases: huridocs/pdf-document-layout-analysis
v0.0.32
Full Changelog: v0.0.31...v0.0.32
v0.0.31
v0.0.30
Upgrade pdf-features version to fix common text height issue if there is no text & list type
v0.0.29
Upgrade pdf-features version to fix hyperlink styled content markdown
v0.0.28
Full Changelog: v0.0.27...v0.0.28
v0.0.27
Full Changelog: v0.0.26...v0.0.27
v0.0.26
Full Changelog: v0.0.25...v0.0.26
v0.0.25
Full Changelog: v0.0.24...v0.0.25
v0.0.24
What's Changed
- OCR multi lang support by @santiagotri in #106
- Clean architecture by @ali6parmak in #119
Support for PDF-to-markdown and PDF-to-HTML:
-
Different sizes of titles
-
Superscripts/Subscripts
-
Bold/Italic text
-
Tables in HTML format
-
Formulas in LaTeX format
-
List items with different indentations
-
Hyperlinks
-
In-document references
-
Pictures
-
Table of contents information (optional with
extract_tocparameter) -
Restructured & refactored all the project to clean architecture.
-
Updated formula extraction model to a better one
-
Updated table extraction model to a better & much faster one
New Contributors
- @santiagotri made their first contribution in #106
Full Changelog: v0.0.23...v0.0.24