Skip to content

Releases: Unstructured-IO/unstructured

0.3.1

14 Dec 18:00
1700d4d

Choose a tag to compare

0.3.1

  • Added __init.py__ to partition

0.3.0

14 Dec 16:39
151732c

Choose a tag to compare

0.3.0

  • Implement staging brick for Argilla. Converts lists of Text elements to argilla dataset classes.
  • Removing the local PDF parsing code and any dependencies and tests.
  • Reorganizes the staging bricks in the unstructured.partition module
  • Allow entities to be passed into the Datasaur staging brick
  • Added HTML escapes to the replace_unicode_quotes brick
  • Fix bad responses in partition_pdf to raise ValueError
  • Adds partition_html for partitioning HTML documents.

0.2.4

11 Nov 00:31
4f539dd

Choose a tag to compare

  • Add an alternative way of importing Final to support google colab

0.2.3

10 Nov 21:37
300c564

Choose a tag to compare

0.2.3

  • Add cleaning bricks for removing prefixes and postfixes
  • Add cleaning bricks for extracting text before and after a pattern

0.2.2

08 Nov 22:07
2715950

Choose a tag to compare

0.2.2

  • Add staging brick for Datasaur

0.2.1

21 Oct 18:53
de31df5

Choose a tag to compare

0.2.1

  • Added brick to convert an ISD dictionary to a list of elements
  • Update PDFDocument to use the from_file method
  • Added staging brick for CSV format for ISD (Initial Structured Data) format.
  • Added staging brick for separating text into attention window size chunks for transformers.
  • Added staging brick for LabelBox.
  • Added ability to upload LabelStudio predictions
  • Added utility function for JSONL reading and writing
  • Added staging brick for CSV format for Prodigy
  • Added staging brick for Prodigy
  • Added ability to upload LabelStudio annotations
  • Added text_field and id_field to stage_for_label_studio signature