You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Delta Tables in Amazon S3 destination connector generates the following output within the specified path to the S3 bucket (or the specified folder within the bucket):
99
+
100
+
- Initially, one Parquet (`.parquet`) file per file in the source location. For example, for a file in the source location named `my-file.pdf`, an associated
101
+
file with the extension `.parquet` is generated. Various kinds of file transactions can result in additional Parquet files being generated. These Parquet filenames are automatically generated by the Delta Lake engine and are not meant to be manually modified.
102
+
- A folder named `_delta_log` that contains metadata and change history about the `.parquet` files. As Parquet files are added to, changed, or removed from
103
+
the specified bucket or folder path, the `_delta_log` folder is updated with any related metadata and change history details.
104
+
105
+
Together, this set of Parquet files and their associated `_delta_log` folder (and its contents) describe a single, versioned Delta table. Because of this, Unstructured recommends the following usage best practices:
106
+
107
+
- In the source location, each set of source files that is to be considered as a unit for change management purposes should be controlled by a unique, dedicated
108
+
Delta Tables in S3 destination connector. This connector should reference a unique, dedicated output folder within the bucket. Having
109
+
multiple workflows refer to different sets of source files, yet all share the same Delta table, could results in data loss or table corruption.
110
+
- Avoid directly modifying, adding, or deleting Parquet data files or the `_delta_log` folder within a Delta table's directory. This can lead to data loss or table corruption.
111
+
- If you need to copy or move a Delta table to a different location,
112
+
you must move or copy its entire set of Parquet files and its associated `_delta_log` folder (and its contents) together as a unit.
113
+
Note that the copied or moved Delta table will
114
+
no longer be controlled by the original Delta Tables in S3 destination connector.
0 commit comments