Skip to content

Releases: Unstructured-IO/unstructured-ingest

0.3.8

11 Dec 03:03
cb77b13

Choose a tag to compare

0.3.8

Fixes

  • Prevent pinecone delete from hammering database when deleting

0.3.7

09 Dec 15:15
e041b86

Choose a tag to compare

0.3.7

Fixes

  • Correct fsspec connectors date metadata field types - sftp, azure, box and gcs
  • Fix Kafka source connection problems
  • Fix Azure AI Search session handling
  • Fixes issue with SingleStore Source Connector not being available
  • Fixes issue with SQLite Source Connector using wrong Indexer - Caused indexer config parameter error when trying to use SQLite Source
  • Fixes issue with Snowflake Destination Connector nan values - nan values were not properly replaced with None
  • Fixes Snowflake source 'SnowflakeCursor' object has no attribute 'mogrify' error
  • Box source connector can now use raw JSON as access token instead of file path to JSON
  • Fix fsspec upload paths to be OS independent
  • Properly log elasticsearch upload errors

Enhancements

  • Kafka source connector has new field: group_id
  • Support personal access token for confluence auth
  • Leverage deterministic id for uploaded content
  • Makes multiple SQL connectors (Snowflake, SingleStore, SQLite) more robust against SQL injection.
  • Optimizes memory usage of Snowflake Destination Connector.
  • Added Qdrant Cloud integration test
  • Add DuckDB destination connector Adds support storing artifacts in a local DuckDB database.
  • Add MotherDuck destination connector Adds support storing artifacts in MotherDuck database.
  • Update weaviate v2 example

0.3.6

30 Nov 03:46
d88ca87

Choose a tag to compare

What's Changed

Full Changelog: 0.3.5...0.3.6

0.3.5

26 Nov 20:24
507372c

Choose a tag to compare

0.3.5

Enhancements

  • Persist record id in dedicated LanceDB column, use it to delete previous content to prevent duplicates.

Fixes

  • Remove client.ping() from the Elasticsearch precheck.
  • Pinecone metadata fixes - Fix CLI's --metadata-fields default. Always preserve record ID tracking metadata.
  • Add check to prevent querying for more than pinecone limit when deleting records
  • Unregister Weaviate base classes - Weaviate base classes shouldn't be registered as they are abstract and cannot be instantiated as a configuration

0.3.4

25 Nov 16:00
dc00929

Choose a tag to compare

0.3.4

Enhancements

  • Add azure openai embedder
  • Add collection_id field to Couchbase downloader_config

0.3.3

23 Nov 00:44
b1f0974

Choose a tag to compare

0.3.3

Enhancements

  • Add precheck to Milvus connector

Fixes

  • Make AstraDB uploader truncate text and text_as_html content to max 8000 bytes
  • Add missing LanceDb extra
  • Weaviate cloud auth detection fixed

0.3.2

21 Nov 22:27
ecf22d0

Choose a tag to compare

0.3.2

Enhancements

  • Persist record id in mongodb data, use it to delete previous content to prevent duplicates.

Fixes

  • Remove forward slash from Google Drive relative path field
  • Create LanceDB test databases in unique remote locations to avoid conflicts
  • Add weaviate to destination registry

0.3.1

20 Nov 19:44
b19102a

Choose a tag to compare

0.3.1

Enhancements

  • LanceDB V2 Destination Connector
  • Persist record id in milvus, use it to delete previous content to prevent duplicates.
  • Persist record id in weaviate metadata, use it to delete previous content to prevent duplicates.
  • Persist record id in sql metadata, use it to delete previous content to prevent duplicates.
  • Persist record id in elasticsearch/opensearch metadata, use it to delete previous content to prevent duplicates.

Fixes

  • Make AstraDB precheck fail on non-existant collections
  • Respect Pinecone's metadata size limits crop metadata sent to Pinecone's to fit inside its limits, to avoid error responses
  • Propagate exceptions raised by delta table connector during write

0.3.0

15 Nov 20:13
c2296d2

Choose a tag to compare

0.3.0

Enhancements

  • Added V2 kafka destination connector
  • Persist record id in pinecone metadata, use it to delete previous content to prevent duplicates.
  • Persist record id in azure ai search, use it to delete previous content to prevent duplicates.
  • Persist record id in astradb, use it to delete previous content to prevent duplicates.
  • Update Azure Cognitive Search to Azure AI Search

Fixes

  • Fix Delta Table destination precheck Validate AWS Region in precheck.
  • Add missing batch label to FileData where applicable
  • Handle fsspec download file into directory When filenames have odd characters, files are downloaded into a directory. Code added to shift it around to match expected behavior.
  • Postgres Connector Query causing syntax error when ID column contains strings

0.2.2

08 Nov 20:11
373568c

Choose a tag to compare

0.2.2

Enhancements

  • Remove overwrite field from fsspec and databricks connectors
  • Added migration for GitLab Source V2
  • Added V2 confluence source connector
  • Added OneDrive destination connector
  • Qdrant destination to v2
  • Migrate Kafka Source Connector to V2