Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.3.1
0.3.1
- Added __init.py__ to
partition
0.3.0
0.3.0
- Implement staging brick for Argilla. Converts lists of
Textelements toargilladataset classes. - Removing the local PDF parsing code and any dependencies and tests.
- Reorganizes the staging bricks in the unstructured.partition module
- Allow entities to be passed into the Datasaur staging brick
- Added HTML escapes to the
replace_unicode_quotesbrick - Fix bad responses in partition_pdf to raise ValueError
- Adds
partition_htmlfor partitioning HTML documents.
0.2.4
- Add an alternative way of importing
Finalto support google colab
0.2.3
0.2.3
- Add cleaning bricks for removing prefixes and postfixes
- Add cleaning bricks for extracting text before and after a pattern
0.2.2
0.2.2
- Add staging brick for Datasaur
0.2.1
0.2.1
- Added brick to convert an ISD dictionary to a list of elements
- Update
PDFDocumentto use thefrom_filemethod - Added staging brick for CSV format for ISD (Initial Structured Data) format.
- Added staging brick for separating text into attention window size chunks for
transformers. - Added staging brick for LabelBox.
- Added ability to upload LabelStudio predictions
- Added utility function for JSONL reading and writing
- Added staging brick for CSV format for Prodigy
- Added staging brick for Prodigy
- Added ability to upload LabelStudio annotations
- Added text_field and id_field to stage_for_label_studio signature