-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Bug description
During writer rotation it could leave an empty file (for parquet it is an empty file with 0B size, for dwrf, it is a non-empty header only file). The 0B parquet file causes error during reading.
#16389 provide an fix to block generate 0B parquet file.
But there is an option ensureFiles, and per comment:
// When this option is set the HiveDataSink will always write a file even
// if there's no data. This is useful when the table is bucketed, but the
// engine handles ensuring a 1 to 1 mapping from task to bucket.
the intention is create file always.
For parquet writer, when this option is set, should we align the behaviour with dwrf? Such as create an empty data file that just include empty metadata.
System information
er Version: 17.0.0.17000013
C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc
C Compiler Version: 17.0.0.17000013
CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX15.5.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/Applications/CMake.app/Contents;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local
Result copied to clipboard!