Skip to content

Releases: Unstructured-IO/unstructured

0.18.7

15 Jul 20:59
344202f

Choose a tag to compare

0.18.7

Enhancements

Features

  • Add language detection for PDFs Add document and element level language detection to PDFs.

Fixes

0.18.6

15 Jul 19:08
2ffaf6f

Choose a tag to compare

0.18.6

Enhancements

Features

Fixes

  • Improved epub partition errors EPUB partition will now produce new type of error on unprocessable files.
  • Fix type for serialized TableChunks Use TableChunk for the string value of the field type when serializing elements of type TableChunk, rather than using the value Table.

0.18.4

08 Jul 08:13
f078cd9

Choose a tag to compare

What's Changed

Full Changelog: 0.18.3...0.18.4

0.18.3

05 Jul 19:34
8a9abdd

Choose a tag to compare

What's Changed

Full Changelog: 0.18.2...0.18.3

0.18.2

01 Jul 23:42
d7dfda9

Choose a tag to compare

What's Changed

Full Changelog: 0.18.1...0.18.2

0.18.1

24 Jun 23:52
3f87946

Choose a tag to compare

Enhancements

Features

  • Add DocumentData element type This is helpful in scenarios where there is large data that does not make sense to represent across each element in the document.

Fixes

  • The encoding property of the _CsvPartitioningContext is now properly used.

0.17.11-dev1

13 Jun 02:43
5e43e36

Choose a tag to compare

0.17.11-dev1 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: 0.17.2...0.17.11-dev1

0.17.2

20 Mar 16:52
0fa5174

Choose a tag to compare

Enhancements

  • Add image_url of images in html partitioner <img> tags with non-data content include a new image_url metadata field with the content of the src attribute.

  • Use lxml instead of bs4 to parse hOCR data. lxml is much faster than bs4 given the hOCR data format is regular (garanteed because it is programatically generated)

  • bump numpy to >2. And upgrade paddlepaddle, unstructured-paddleocr, onnx so they are compatible with numpy>2.

Fixes

  • Fix Image in a
    tag is "UncategorizedText" with no .text

What's Changed

Full Changelog: 0.17.0...0.17.2

0.17.0

12 Mar 15:57
2dceac3

Choose a tag to compare

What's Changed

Full Changelog: 0.16.25...0.17.0

0.16.25

07 Mar 11:17
74b0647

Choose a tag to compare

0.16.25

Enhancements

Features

Fixes

  • Fixes filetype detection for jsons passed as byte streams - Now it prioritizes magic mimetype prediction over file extension when detecting filetypes