Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.5.0
0.5.0
Enhancements
- Add
requires_dependenciesPython decorator to check dependencies are installed before
instantiating a class or running a function
Features
- Added Wikipedia connector for ingest cli.
Fixes
- Fix
process_documentfile cleaning on failure - Fixes an error introduced in the metadata tracking commit that caused
NarrativeText
andFigureCaptionelements to be represented asTextin HTML documents.
0.4.16
0.4.16
Enhancements
- Fallback to using file extensions for filetype detection if
libmagicis not present
Features
- Added setup script for Ubuntu
- Added GitHub connector for ingest cli.
- Added
partition_mdpartitioner. - Added Reddit connector for ingest cli.
Fixes
- Initializes connector properly in ingest.main::MainProcess
- Restricts version of unstructured-inference to avoid multithreading issue
0.4.15
0.4.15
Enhancements
- Added
elements_to_jsonandelements_from_jsonfor easier serialization/deserialization convert_to_dict,dict_to_elementsandconvert_to_csvare now aliases for functions
that use the ISD terminology.
Fixes
- Update to ensure all elements are preserved during serialization/deserialization
0.4.14
0.4.14
- Automatically install
nltkmodels in thetokenizemodule.
0.4.13
0.4.12
0.4.11
0.4.11
- Adds
partition_docfor partitioning Word documents in.docformat. Requireslibreoffice. - Adds
partition_pptfor partitioning PowerPoint documents in.pptformat. Requireslibreoffice.
0.4.10
0.4.10
- Fixes
ElementMetadataso that it's JSON serializable when the filename is aPathobject.
0.4.9
0.4.9
- Added ingest modules and s3 connector
- Default to
url=Noneforpartition_pdfandpartition_image - Add ability to skip English specific check by setting the
UNSTRUCTURED_LANGUAGEenv var to"". - Document
Elementobjects now track metadata
0.4.8
0.4.8
- Modified XML and HTML parsers not to load comments.