Skip to content

[Parquet] Empty file generated during writer rotation #16527

@PingLiuPing

Description

@PingLiuPing

Bug description

During writer rotation it could leave an empty file (for parquet it is an empty file with 0B size, for dwrf, it is a non-empty header only file). The 0B parquet file causes error during reading.
#16389 provide an fix to block generate 0B parquet file.

But there is an option ensureFiles, and per comment:

  // When this option is set the HiveDataSink will always write a file even
  // if there's no data. This is useful when the table is bucketed, but the
  // engine handles ensuring a 1 to 1 mapping from task to bucket.

the intention is create file always.

For parquet writer, when this option is set, should we align the behaviour with dwrf? Such as create an empty data file that just include empty metadata.

System information

er Version: 17.0.0.17000013
C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc
C Compiler Version: 17.0.0.17000013
CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX15.5.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/Applications/CMake.app/Contents;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local

Result copied to clipboard!

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNewly created issue that needs attention.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions