v0.0.24
What's Changed
- OCR multi lang support by @santiagotri in #106
- Clean architecture by @ali6parmak in #119
Support for PDF-to-markdown and PDF-to-HTML:
-
Different sizes of titles
-
Superscripts/Subscripts
-
Bold/Italic text
-
Tables in HTML format
-
Formulas in LaTeX format
-
List items with different indentations
-
Hyperlinks
-
In-document references
-
Pictures
-
Table of contents information (optional with
extract_tocparameter) -
Restructured & refactored all the project to clean architecture.
-
Updated formula extraction model to a better one
-
Updated table extraction model to a better & much faster one
New Contributors
- @santiagotri made their first contribution in #106
Full Changelog: v0.0.23...v0.0.24