Releases: Unstructured-IO/unstructured-ingest
Releases · Unstructured-IO/unstructured-ingest
0.3.8
0.3.8
Fixes
- Prevent pinecone delete from hammering database when deleting
0.3.7
0.3.7
Fixes
- Correct fsspec connectors date metadata field types - sftp, azure, box and gcs
- Fix Kafka source connection problems
- Fix Azure AI Search session handling
- Fixes issue with SingleStore Source Connector not being available
- Fixes issue with SQLite Source Connector using wrong Indexer - Caused indexer config parameter error when trying to use SQLite Source
- Fixes issue with Snowflake Destination Connector
nanvalues -nanvalues were not properly replaced withNone - Fixes Snowflake source
'SnowflakeCursor' object has no attribute 'mogrify'error - Box source connector can now use raw JSON as access token instead of file path to JSON
- Fix fsspec upload paths to be OS independent
- Properly log elasticsearch upload errors
Enhancements
- Kafka source connector has new field: group_id
- Support personal access token for confluence auth
- Leverage deterministic id for uploaded content
- Makes multiple SQL connectors (Snowflake, SingleStore, SQLite) more robust against SQL injection.
- Optimizes memory usage of Snowflake Destination Connector.
- Added Qdrant Cloud integration test
- Add DuckDB destination connector Adds support storing artifacts in a local DuckDB database.
- Add MotherDuck destination connector Adds support storing artifacts in MotherDuck database.
- Update weaviate v2 example
0.3.6
0.3.5
0.3.5
Enhancements
- Persist record id in dedicated LanceDB column, use it to delete previous content to prevent duplicates.
Fixes
- Remove client.ping() from the Elasticsearch precheck.
- Pinecone metadata fixes - Fix CLI's --metadata-fields default. Always preserve record ID tracking metadata.
- Add check to prevent querying for more than pinecone limit when deleting records
- Unregister Weaviate base classes - Weaviate base classes shouldn't be registered as they are abstract and cannot be instantiated as a configuration
0.3.4
0.3.3
0.3.3
Enhancements
- Add
precheckto Milvus connector
Fixes
- Make AstraDB uploader truncate
textandtext_as_htmlcontent to max 8000 bytes - Add missing LanceDb extra
- Weaviate cloud auth detection fixed
0.3.2
0.3.2
Enhancements
- Persist record id in mongodb data, use it to delete previous content to prevent duplicates.
Fixes
- Remove forward slash from Google Drive relative path field
- Create LanceDB test databases in unique remote locations to avoid conflicts
- Add weaviate to destination registry
0.3.1
0.3.1
Enhancements
- LanceDB V2 Destination Connector
- Persist record id in milvus, use it to delete previous content to prevent duplicates.
- Persist record id in weaviate metadata, use it to delete previous content to prevent duplicates.
- Persist record id in sql metadata, use it to delete previous content to prevent duplicates.
- Persist record id in elasticsearch/opensearch metadata, use it to delete previous content to prevent duplicates.
Fixes
- Make AstraDB precheck fail on non-existant collections
- Respect Pinecone's metadata size limits crop metadata sent to Pinecone's to fit inside its limits, to avoid error responses
- Propagate exceptions raised by delta table connector during write
0.3.0
0.3.0
Enhancements
- Added V2 kafka destination connector
- Persist record id in pinecone metadata, use it to delete previous content to prevent duplicates.
- Persist record id in azure ai search, use it to delete previous content to prevent duplicates.
- Persist record id in astradb, use it to delete previous content to prevent duplicates.
- Update Azure Cognitive Search to Azure AI Search
Fixes
- Fix Delta Table destination precheck Validate AWS Region in precheck.
- Add missing batch label to FileData where applicable
- Handle fsspec download file into directory When filenames have odd characters, files are downloaded into a directory. Code added to shift it around to match expected behavior.
- Postgres Connector Query causing syntax error when ID column contains strings