Releases: Unstructured-IO/unstructured-inference
Releases · Unstructured-IO/unstructured-inference
0.7.2
0.7.1
0.7.0
0.6.6
0.6.6
- Stop passing ocr_languages parameter into paddle to avoid invalid paddle language code error, this will be fixed until
we have the mapping from standard language code to paddle language code.
0.6.5
0.6.4
0.6.3
What's Changed
- feat: make table transformer parameters configurable by @badGarnet in #224
- feat: add pre commit hook by @badGarnet in #220
Bug fixes
- fix: padded boxes are not rescaled/shifted correctly by @badGarnet in #229
Full Changelog: 0.6.1...0.6.3
0.6.1
What's Changed
- feat: add config class by @badGarnet in #218 This change allows a user to specific inference parameters via environment variables.
- Fix/overlapping of bboxes by @benjats07 in #201 This change makes
yoloxthe default model for element detection and removes duplicated or near duplicated bounding boxes in the results to reduce noise in the final elements.
Full Changelog: 0.5.31...0.6.1
0.5.31
0.5.31
- Add functionality to extract and save images from the page
- Add functionality to get only "true" embedded images when extracting elements from PDF pages
- Update the layout visualization script to be able to show only image elements if need
- add an evaluation metric for table comparison based on token similarity
- fix paddle unit tests where
make testfails since paddle doesn't work on M1/M2 chip locally
0.5.28
0.5.28
- add env variable
ENTIRE_PAGE_OCRto specify using paddle or tesseract on entire page OCR